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FOREWORD 


The  first  paragraph  of  a  letter  dated  13  July  1987  from  Professor  Jerry 
Bebernes  to  Dr.  Jagdish  Chandra  stated,  "The  University  of  Colorado  would 
like  to  host  the  1988  Annual  Army  Conference  on  Applied  Mathematics  and 
Computing.  Dave  Kassoy  and  I  would  assist  with  the  local  arrangements  and 
other  details  if  requested."  These  yearly  conferences  are  sponsored  by  the 
Army  Mathematics  Steering  Committee  (AMSC).  On  behalf  of  this  Committee, 
its  Chairman,  Dr.  Chandra,  was  pleased  to  accept  this  invitation  to  host 
the  sixth  meeting  in  this  series,  which  was  held  on  31  May  -  3  June  1988  in 
Boulder,  Colorado.  The  Local  Chai rpersons,  Drs.  Bebernes  and  Kassoy,  are 
to  be  commended  on  the  very  fine  job  they  did,  not  only  for  the  excellent 
visitor  arrangements,  but  also  for  their  help  in  selecting  speakers  and 
organizing  special  sessions. 

This  year  the  planned  program  of  the  conference  consisted  of  three  parts, 
namely:  (a)  Seven  one  hour  invited  addresses;  (b)  Thirty-two  half  hour 
solicited  talks  covering  the  following  topics:  Computational  Solid  and 
Structural  Mechanics,  Reactive  and  Compressible  Flows,  Symbolic  Computing 
and  Applications,  and  Parallel  Computing;  and  (c)  Thirty-two  contributed 
papers.  Most  of  the  latter  were  presented  by  Army  scientists  and  covered 
topics  directly  related  to  problems  they  face  in  their  laboratories. 

During  the  course  of  the  conference,  these  Army  scientists  had  an 
opportunity  to  discuss  problems  with  nationally  known  scientists.  Some  of 
these  were  the  invited  speakers  who  are  listed  below,  together  with  the 
titles  of  their  talks,  but  also,  many  others  that  appeared  on  the  program 
or  were  members  of  the  audience. 


SPEAKER  AND  AFFILIATION 

Professor  Thomas  Kailath 
Stanford  University 

Professor  Ted  Belytschko 
Northwestern  University 


Professor  A.  R.  Kapi 1  a 
Rensselaer  Polytechnic 
Institute 


TITLE  OF  ADDRESS 

Some  New  Applications  of  Matrix 
Displacement  Structures 

Nonmonotonic  Stress-Strain  Laws: 
Bizarre  Behavior  and  Its  Repercus¬ 
sions  on  Numerical  Solutions 

Recent  Developments  in  the  Theory 
of  Compressible  Reactive  Flows 


Professor  Moss  Sweedler  Applicable  Algebraic  Methods 

Cornell  University 


iii 


j 


Professor  Oliver  A.  McBryan 
University  of  Colorado 


Promise  vs.  Performance  for 
Massively  Parallel  Computers 


Professor  Robert  B.  Schnabel 
University  of  Colorado 


New  Sequential  and  Parallel  Methods 
for  Unconstrained  Optimization 


Professor  Luc  Tartar 
Carnegie-Mel 1  on  University 


How  to  Describe  Oscillations  of 
Solutions  of  Nonlinear  Partial 
Differential  Equations 


The  members  of  the  AMSC  would  like  to  express  their  thanks  to  the  speakers 
and  research  scientists  who  participated  in  this  meeting,  and  to  all  the 
attendees  for  supporting  it  with  many  stimulating  questions.  The  AMSC  is 
pleased  to  be  able  to  publish  in  these  Transactions  many  of  the  conference 
papers  and  thus  to  make  available  to  the  scientific  community  some  of  the 
research  results  presented  at  this  meeting. 
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Divide-and'Conquer  Solutions  of  Least*Squares  Problems 
for  Matrices  with  Displacement  Structure 

J.  Qiun  and  T.  Kailath  t 

Information  Systems  Laboratory 
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Stanford,  CA  94305 

ABSTRACT.  A  divide-and-conquer  implementation  of  a  generalized  Schur  algorithm  enables  us  to 
solve  various  (exact  and)  least  squares  block-Toeplitz  or  Toeplitz-block  systems  of  equations  with 
0(a?nlo^n)  operations,  where  the  displacement  rank  a  is  a  small  constant  (typically  between  2  to  4 
for  scalar  near-Toeplitz  matrices)  independent  of  the  size  of  matrices. 

1.  Introduction. 

In  recent  years,  there  has  been  considerable  research  on  fast  algorithms  for  the  solution  of  linear 
systems  of  equations  with  Toeplitz  matrices.  The  Levinson  and  Schur  algorithms  allow  (recursive) 
solutions  with  O(n^)  floating  point  operations  (flops)  for  systems  with  n  x  n  Toeplitz  matrices. 

In  1980.  Brent  et  al  [5]  described  a  (nonrecursive)  scheme  for  obtaining  a  solution  wifli 
O(nlo^n)  flops.  This  was  based  on  two  ideas  -  the  use  of  the  Gohberg-Semencul  formula  [10],  [11], 
[IS]  for  the  inverse  of  a  Toeplitz  matrix,  and  the  use  of  divide-and-conquer  (or  doubling)  techniques 
for  computing  (generators  oQ  the  Gohberg-Semencul  formula. 

Let  X  and  y  denote  the  first  and  last  columns  of  T"*  e  .  Then  if  the  first  component  of  x, 
say  Xi,  is  nonzero,  Gohberg  and  Semencul  [11]  showed  that  we  could  write 

r-*  =  -5-[L(x)L^(7,y)  -L(Z,y)L^(Zj„x)]  (1) 

where  is  the  reverse-identity  matrix,  Z„  is  the  shift  matrix. 


h  * 

and 

L(v)  =  a  lower-triangular  Toeplitz  matrix  with  first  column  v. 

The  significance  of  (1)  in  the  present  application  is  that  the  product  of  a  vector  and  a  lower-  or  upper- 

t  This  woik  wu  suppoited  in  ptit  by  the  U.S.  Army  Reiesrch  OfSoe  under  Contnct  DAA1j03-86-K-004S,  the  SOIO/IST.  managed  by  the 
Army  Research  Office  under  Qmtraci  OAALO3-S7-K-O033,  and  the  National  Science  Foundation  under  GraiK  .VlIP-2131S-;\2, 
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triangular  Toeplitz  matrix  is  equivalent  to  the  convolution  of  two  vectors,  which  can  be  done  in  a  fast 
way  using  0(rtlog/i)  flops  (see,  e.g.  [4]).  This  compares  with  the  O(n^)  operations  required  with  the 
non-Toeplitz  triangular  matrix  factor  of  (obtainable  in  O(n^)  flops  via  the  Levinson  algorithm). 
Brent  et  al  showed  how  to  use  divide-and-conquer  techniques  combined  with  a  fast  Euclidean 
algorithm  (faster  than  the  one  in  [1])  to  obtain  the  vectors  {x,  y)  of  the  Gohberg-Semencul  formula 
with  O(rtlog^n)  flops.  Later  Bitmead  and  Anderson  [3]  and  Morf  [19]  used  another  approach,  based 
on  the  displacement-rank  properties  of  matrix  Schur  complements,  to  obtain  similar  results;  while  this 
approach  allows  for  generalization  to  non-Toeplitz  matrices  (further  discussed  below),  the  hidden 
coefficient  in  their  proposed  O(nlog^n)  constructions  turned  out  to  be  extremely  large  (see  Sexton 
[23]).  Later  Musicus  [20],  Bruckstein  and  Kailath  [6],  de  Hoog  [9],  Ammar  and  Gragg  [2]  used  an 
approach  based  on  the  Schur  (rather  than  Levinson)  algorithm  to  obtain  better  coefficients;  in 
particular,  Ammar  and  Gragg  made  a  detailed  study  and  claimed  an  operation  count  of  Snlo^n  flops. 
With  this  count,  the  new  (called  superfast  in  [2])  method  for  solving  Toeplitz  .systems  is  better  than  the 
one  based  on  the  Levinson  algorithm  whenever  n  >  256.  We  should  mention  here  that  Schur- 
algorithra-based  methods  are  namral  in  the  context  of  transmission-line  and  layered-earth  models,  so  it 
is  not  a  surprise  that  similar  techniques  were  also  conceived  in  those  fields  -  see  Choate  [7],  McQary 
[18]  and  Bruckstein  and  Kailath  [6].  A  good  source  for  background  on  the  Levinson  and  Schur 
algorithms,  transmission  line  models,  displacement  representations  as  mentioned  and  used  in  the 
present  paper  may  be  [12]. 

Our  paper  is  in  the  spirit  of  the  methods  based  on  the  Schur  algorithm,  but  is  more  general 
without  the  drawback  of  large  coefficient  of  the  methods  by  Bitmead  and  Anderson  or  Morf.  We  can 
handle  matrices  such  as  (T^T)"*  and  where  T  may  be  a  near-Toeplitz  matrix  iiKluding 

rectangular  block-Toeplitz  matrices  and  Toepliiz-block  matrices;  in  particular,  therefore,  we  can  also 
obtain  the  least-squares  solutions  of  over-determined  Toeplitz  and  near-Toeplitz  systems  with 
O(nlog^n)  flops. 

An  outline  of  our  approach  is  the  following.  For  a  matrix  E , 
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^2.1  ^22 


El  l,  nonsingular. 


the  Schur  complement  of  £  ^  in  £  is 

^  -^1,1  “  ^2.1^  rl^  1.2- 


Notice  that  matrices  such  as 


SisT"*,  52  3  (T^7')-‘,  Sj3(r‘T)~^T' 

can  be  identified  as  the  Schur  complements  of  the  following  extended  matrices. 
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Now  the  matrices  £  in  (3)  have  the  following  (generalized)  displacement  representation,  for  suitably 
chosen  matrices  {F^ ,  £*  ] , 


where  KiXi,  F^)  and  K(yi,  F’’)  are  lower  triangular  matrices  whose  j  columns  are  and 

(/r*)0-Uy.^  respectively.  The  smallest  possible  number  a  is  called  the  displacement  rank  of  £  with 
respect  to  [F^ ,  £* ).  For  an  example,  let  T  be  an  m  x  «  scalar  Toeplitz  matrix,  with  m  >  n.  Then 


the  matrix  £2  has  the  displacement  rank  4  with  respect  to  {£,£),  where  F  = 
displacement  representation  [13], 
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If  we  define  xf  s  [w,^,  v,^],  note  that  the  matrix  Kix^,  £)  in  (4a)  has  the  form 
£(w.)  O 
£(v.)  O 


e  R 


2ax2ii 


,  0  e  R"’^, 


(4b) 


where  £(w,  )  and  £(v,  )  are  lower  triangular  Toeplitz  matrices  with  first  columns  w,  and  v, . 


Given  a  displacement  representation  of  £,  we  use  a  certain  generalized  Schur  algorithm  [8],  [13] 
to  successively  compute  displacement  representations  of  the  Schur  complements  of  all  the  leading 
principal  submatrices  in  £.  For  the  above  example,  n  steps  of  the  generalized  Schur  algorithm  will 
yield 

O  F)  -  i^r(u,-,  F)K\n„  £), 

.  ^  ^  j  1*1  j=*3 


where  the  top  n  elements  of  u,  are  zero.  Therefore,  if  we  denote  the  bottom  n  elements  of  u,  as  U2,,-, 
we  can  re-write 

(jTryl  =  2:^(U2,i)£^(U2,)  -  Z£(U2..)£^(U2,). 

1-1  i-3 

Now,  the  generalized  Schur  algorithm,  which  is  a  two-teim  polynomial  recursion,  can  be 
implemented  in  a  divide-and-conquer  fashion  with  O (n)logn)  flops,  where  f(n)  denotes  the 
number  of  operations  for  the  multiplication  of  two  polynomials.  Therefore,  if  the  multiplication  of  two 
polynomials  is  done  again  by  divide-and-conquer,  i.e.,  by  using  fast  convolution  algorithms,  then  the 
overall  computation  requires  0(a^«log^n)  flops.  We  remark  that  the  factor  a?  can  be  reduced  to  a  if 
several  convolutions  can  be  performed  in  parallel.  Once  we  have  a  displacement  of  the  desired  Schur 
complement  S,  the  matrix-vector  multiplication,  Sb  can  be  done  with  0(cyilog/i)  flops  using  fast 
convolutions.  As  an  example,  we  can  obtain  the  least  squares  solution  for 

Tx  =  h,  r  6  R""^,  m  >  n 


by 

(i)  Multiply  T^b  using  a  fast  convolution  algorithm, 

(ii)  Obtain  a  displacement  representation  of  using  the  divide-and-conquer  version  of 

generalized  Schur  algorithm, 

(iii)  Multiply  (T^T)~^(T^h)  using  a  fast  convolution  algorithm. 
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If  we  obtain  displacement  representation  of  (T^D  directly  using  £3,  then  the  step  (i)  would  not 
be  needed. 

2.  Generalized  Schur  Algorithm. 

After  a  brief  review  of  basic  concepts  and  definitions,  we  shall  present  the  generalized  Schur 
algorithm  in  polynomial  form. 

Generators  of  Matrices. 

Let  and  be  nilpotent  matrices.  The  matrix 

is  called  the  displacement  of  A  with  respect  to  the  displacement  operators  {F^ ,  F* ) .  Define  the 
{F^ ,  F*"  )-displacement  rank  of  A  as  rank[V^^/^fcy4  ].  Any  matrix  pair  {X ,  T )  such  that 

=  xy^,  X  H  [  Xi.  Xj,  .  .  ,  Xa  1.  Y  s[  yj,  yj. .  .  ,  ]  (5) 

is  called  a  (vector  form)  generator  of  A  with  respect  to  {F'^,  F* }.  The  generator  will  be  said  to  have 
length  CL  If  the  length  a  is  equal  to  the  displacement  rank  of  A ,  we  say  that  the  generator  is  minimal. 
A  generator  such  as  K  =  XI,  where  I  is  a  diagonal  matrix  with  1  or  -1  along  the  diagonal,  is  called  a 
symmetric  generator. 

The  following  Lemma  [13],  [14]  establishes  the  connection  between  generators  and  displacement 
representations. 

a 

Lemma.  Let  £  be  an  m  x  n  matrix.  If  F^  and  F^  are  nilpotent,  then  =  ^,yi^  has  the 

unique  solution  E  where  X(x,  ,  F^)  *  [x,  ,  F-^x,  ,  •  •  ,  F'^^""'^x,  ], 

1 

A:(y,-.F^)  =  [y,-,F\v,,  •  ■  ,F*('-‘>y.]. 

Choice  of  Displacement  Operators. 

The  generalized  Schur  algorithm  operates  with  generators,  and  needs  O  (amn )  flops  for  sequential 
implementation  and  0(a^/ilog^n)  for  divide-and-conquer  implementation.  Therefore,  for  a  given 
matrix  A ,  we  should  try  to  choose  the  displacement  operators  that  give  the  smallest  a.  If  the  matrix  A 
is  an  n  X  n  Toeplitz  matrix,  the  appropriate  displiwrement  operator  F  is  Z„ ,  an  n  x  n  shift  matrix.  If 
A  has  some  near-Toeplitz  structure,  then  F  would  have  forms  such  as 

F  =  Z,  ©Z„ ,  F  =  ©Z„  ,  F  =  Z„^, 

'Zn  Ol 

where  ©  denotes  the  direct  sum,  Z„  ©Z„,  s  ^  7  ,  and  ©  denotes  the  concatenated  direct  sum. 

L  ^  j  i-l 

Example  I.  Let  T  =  be  an  m  x  n  pre-  and  post-windowed  scalar  Toeplitz  matrix,  i.e.,  r,  j  =  0  if 
j  >  i  OT  i  >  m  -  n  +  j  with  m  >  n.  Then  it  is  easy  to  check  that  the  matrix  C  =  (c,_y)  3  T^T  is 
also  a  (unwindowed)  Toeplitz  matrix,  and  with  respect  to  (Z,  ©Z„,  Z„  ©Z„ },  £3  in  (3)  has  a 
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generator  {X ,  K )  of  length  2,  where 

xi  =  [cq.  Cl,  •  •  ,  c„,  -1,  0,  •  •  ,  Qf  Icq^, 

X2  =  [0,  Cl,  •  •  ,  c„,  -1,  0,  ■  ■  ,  Gf  Icq^, 

yi  ~  0»  ^  ‘  ‘  •  ^0*  '  '  ^m-ii »  0*  '  '  0  ]  ^^0  • 

y2  =  -[0,  Cl,  •  •  ,  C„,  fo.  tl.  ■  ■  tm-n>  0,  •  •  0  f  ICq^.  □ 


Example  2.  If  7  is  a  Toeplitz-block  matrix,  i.e.. 


Ti.i 

^2.1 

^2.2 

Tij  -  scalar  m,  x  nj  Toeplitz  matrix. 


(6) 


then  for  the  matrices  E  in  (3),  we  choose  [8],  [13]  the  following  displacement  operators 
M  s 


£i:  =  [©Z„,,]©Fi. 

£*  =  [©Z,.]©Fi, 

m  =  n, 

(7a) 

£2:  =[©Z,J©£i, 

N 

F*  =  [©Z,j]©Fi, 

m  =  n. 

(7b) 

N 

£3:  £^  =  [®Z,]©£i, 

l«l 

N  M 

F"  =[©Z,,.]©[®Z^ 

i»l  i«l 

(7c) 

N 

where  Fx  can  be  either!  Z„  or  ©Z„,. 


Example  3.  On  the  other  hand,  if  the  matrix  T  in  (3)  is  a  block-ToepIitz  matrix  with  (3  x  p  blocks, 

Bq  5_i  • 


T  = 


B,  B. 


•/V+2 


e  e  maA/p,  nsivp. 


(8) 


®M-2  ■  ^-N+M 

then  for  the  extended  matrices  E,  we  should  choose  [8]  the  displacement  operators 
f/  =Z^<S,ZI  F»  =Z^QZl 


(9) 


where,  for  £  i  we  assumed  that  7  is  a  square  n  x  n  matrix. 

Generators  of  the  above  and  other  extended  block-Toeplitz  or  Toeplitz-block  matrices  can  be 
found  in  [8]  and  [13]. 


Polynomial  Form  of  Generators. 


-  —  ■  N 

t  For  the  divide^uid-conquer  impiementitioru  we  prefer  to  choose  see  the  Remark  in  Sec  4. 

•  ■i  » 
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In  general,  the  displacement  operators  and  for  both  extended  block-Toeplitz  matrices  and 
extended  Toeplitz-block  matrices  have  the  form, 

N  N 

F  =  ©2„P,  ns  (10) 

•=i  1=1 

We  shall  say  that  the  displacement  operator  F  in  (10)  has  N  sections.  One  of  the  key  operations  in 
generalized  Schur  algorithms  is  matrix-vector  multiplication,  Fv,  i.e,  a  sectioned  shift  operation.  With 
the  polynomial  representation  of  vectors,  the  shift  operation  has  a  nice  algebraic  expression.  For  a 
given  vector  v,  let  v(z)  denote  the  polynomial  whose  coefficient  for  the  term  z‘  is  the  i-t-lst 
component  of  the  vector,  i.e., 

V  =  [vq.  Vi,  V2,  •  •  ,  v(z)  =  Vo-(-ViZ  -t-V2Z^+-  ■  V„_iZ'’“‘.  (11) 

Then, 

Z„v  s  v'  =  [0,  Vq,  vi,  •  •  ,  v,_2]^  v(z)z  mod  z". 

In  general,  for  the  matrix  whose  displacement  operator  is  the  F  in  (10),  let  us  define  integers 
{5,}  by 

5.  =  Zrt*.  5i  <  52<  •  •  <  5^. 

*=i 

Let  v(z)  and  0(z)  be  polynomials  of  degree  less  than  or  equal  to  n-1,  and  define  the  degree  at  most 
(«,-l)  polyiximial,  v,  (z),  by 

v(z)  =  v,(z)  +  z^V2(2)  +  2%3(z)  +  •  •  +  (12a) 

Now  the  (polynomial  form)  displacement  operator  is  defined  by  the  following  operation, 

v(z)®f  0(z)  s  r(z)  s  ri(z)  +  z\2(z)  +  +  •  ■  +  z^'^-'r,^(z).  (12b) 

where 

/■;  (z )  s  V,  (z  )0(z  ^)  mod  z ,  ( I2c) 

i.e.,  /".  (z)  is  the  polynomial  v,  (z)0(z^)  after  chopping  off  the  higher  degree  terms,  so  that  r,  (z)  has  the 
degree  at  most  (n,-  -  1). 

Let 

X  =  [xi,  X2,  •  •  ,  xj,  Y  =  [yi,  y2.  •  •  .  yj 
be  a  generator  of  a  matrix  A  with  respect  to  certain  {F-^,  F* },  and  let 
x,-»x.(z),  y.-^y.-fw). 

Then  we  call  the  pair  of  polynomial  vectors,  {X(z),  Kfw)},  where 

X(z)  ■  [  x,(z),  X2(z),  •  •  ,  Xa(z)  1,  Y(w)  s  [  yi(H'),  y2('v)'  ’  '  >  ya('v)  ]• 
a  (polynomial  form)  generator  of  A.  with  respect  to  (polynomial  from)  displacement  operator 

[®pf, 


6 


Example  1  (Continued).  The  matrix  £3  in  (3)  has  a  generator  {X(z),  T(w)}  with  respect  to 
{ ®Ff,  }.  where  =  Z„  ©Z, ,F^-Z^  eZ„.  and 

xi(2)  =  (Co  +  cjz  +  •  •  +  c^z"  -  z"'^*]<^o‘'^. 

^2(2)  =  [ciz  +  cz^  +  •■  +  c„z"  -  z'''*’Mco‘^. 

yiCw)  =  [Co  +  CjW  +  •  •  +  C^H-"  +  tow’'*^  +  +  •  ■  + 

yziw)  =  -[Ciw  +  •  ■  +  c„w"  +  row""^'  +  riw""^^  +  •  •  +  t;„_„w'"'^']co*'^. 

Also  notice  that 

JCl(z)©/r/Z  =[CoZ  +Ci2^+-  •  +  C„_iZ"  -z"'"^]Co‘^, 

=  [coH-  +  Ciw^  +  •  ■  +  c^.iw"  +  +  riw"^^  +  •  ■  +  r«-«_iw"‘'"‘]co‘^.  □ 

Next  we  note  that  for  given  vectors  a  and  b  such  that  a^b  ^  0,  we  can  always  find  [8]  matrices 
©  and  ^  such  that 

a^e  =  [a  1',  0,  0,  •  •  ,  0],  b^'F  =  [6 1'.  0.  0,  ■  •  ,  0],  8  'F^  =  I,  (13) 

and  therefore,  a^b  =  ai'bi'.  We  define  polynomial  matrices  6(z)  and  'F(w)  by 


z 

w 

1 

1 

©(Z)3© 

. 

III 

. 

1 

1 

We  remark  also  that  if  a  =  b,  then  'F(h')  =  ©(h-),  and  if  b  =  Za,  where  Zs/^©-/^,  then 
'F(w)  =  ©(h')Z,  so  that  we  only  need  to  find,  and  post-multiply  by,  ©(z). 

Generalized  Schur  Algorithm 

Let  a  matrix  E  have  a  generator  {Xo(z),  To(w)}  with  respect  to  {©/-/.  ©/r»}-  and  define  £,  j 
by 

^1,1 

E  =  \p  p  e  R"’^, 

[*2,1  ^22, 

where  Eij  is  a  k  xk  strongly  nonsingular  matrix,  i.e.,  the  one  with  all  nonsingular  leading 
submatrices.  The  jk-step  of  the  generalized  Schur  algorithm  [81,  [13]  presented  below  in  polynomial 
form  gives  a  generator  of  the  matrix, 

with  respect  to  [®^/.  ®/r*).  or  equivalently,  a  generator  of  S  with  respect  to  (©;'/,  where  F^ 
and  f*  denote  the  trailing  square  submatrices  of  size  (m  -  k)  and  (n  -  k)  of  F^  and  F*, 
respectively. 

Algorithm  (ik-step  Generalized  Schur  Algorithm) 
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Input:  Generator  of  E,  {Xo(z),  displacement  operator  0yr*};  Number  of  steps  k. 

Output:  Generator  of  5  {Xt(z), 

Procedure  GeneralizedSchur 
begin 

for  i  :=  0  to  it  -  1  do  begin 
:=  [z-‘Xi(z)],^; 
b^  :=  [z-y,(z)],^ 

Rnd  0,  (z)  and  to  transfonn  and  b^  such  as  (13); 

X..^i(z)  =  Xi(z)0p^e.(z);  Yi^^(w)  =  y.(H')0f^'P.(w): 

end 

return  {X*(z),  l*(w)}; 
end 

Remark.  The  polynomial  vectors.  Xi(z)  and  X,(w),  have  degrees  m-\  and  n~\  respectively,  for  all  i. 
Each  step  eliminates  the  non-zero  lowest  degree  term,  and  therefore  the  terms  of  X,(z)  and  T,(w) 
whose  degrees  are  less  than  z'  and  w'  are  zeros. 

By  applying  the  generalized  Schur  algorithm,  one  can  obtain  generators,  or  equivalently 
displacement  representations  for  various  interesting  Schur  complements. 

3.  Divide<and-Conquer  Implementation. 

The  (sequential)  k-step  generalized  Schur  algorithm  in  Sec  2  can  also  be  implemented  efficiently 
using  divide-and-conquer  approach.  We  shall  only  explain  how  to  find  Xt(z);  essentially  the  same 
argument  applies  for  T*(w). 

Let  us  define  0^,^(z)  and  X^;^(z)  by 
ep:„(r)  =  0/z)0p+i(2)--0,(z). 

A^p.<,(^)=^O:<,(z)0f/0o:p-i(z).  X 3  X ^z)  mod  z’*\ 

where  ^q.  The  polynomial  matrix  Qp.^{z)  has  a  degree  ^-p+l.  The  polynomial  vector 
Xp.^(z)  has  degree  q,  and  is  obtained  by  dropping  from  Xp{z)  all  terms  of  degree  higher  than  z'^. 
Also  note  the  useful  properties. 

[j:(2)0F0l(2)]©F02(z)  =  ^(z)©f[Qi(2)^2(z)1- 

[Xi(z)  -I-  X2(Z)]0F0(Z)  =  [X,(z)0f  e(z)]  +  [X2(2)0F0(z)]. 

These  properties  and  the  fact  that  Qp.giz)  is  completely  deteraiined  by  Xp;,(z)  allow  a  divide-and- 
conquer  implementation  of  the  generalized  Schur  algorithm. 

Given  Xp.^{z),  we  can  compute  0p;^(z)  as  follows.  If  p  ~q,  then  we  are  successful,  and 
compute  &pp(z)  =  Bp(.z).  Otherwise,  we  choose  an  appropriatet  division  point  r  such  that 
p  <  r  <q,  and  try  to  solve  the  smaller  sub-problem  of  finding  0p.,_i(r),  given  Xp.,_i(z).  Once  we 
know  0p.;._i(z),  we  can  compute  X,..,(z)  by 


X,.,(Z)  =Xo;,(2)©^/eO;r-l(z)  =  [Xo:,(z)®F/6o:p-l(z)]®/r/0p:r-l(z)  (15a) 

=  X^:,(z)0^/e^^_i(z).  (15b) 

Now  we  again  try  to  find  0,;,(z)  given  X,.^(z).  After  we  obtain  Q^.^iz),  we  can  combine  the  two 
results,  &p.r^i(z)  and  0,..^(z),  by  multiplication, 

Qpuj(z)  =  ®p:r-l(z)®r:<7(z)-  (16) 

Programming  details  of  the  above  recursive  generalized  Schur  algorithm  are  shown  in  the  Appendix. 

The  previous  recursive  description  can  be  visualized  nonrecursively  using  trees  (see  Fig  1  and  2). 
Each  node  in  the  tree  is  annotated  with  the  rules:  "find",  "apply"  and  "combine", 

fp.p  :  Find  Qp-piz), 

ap:q  ■  ^r;fl(z)  :=^P;,(z)®fQpt-i(z). 

Cp:q  ■  0p.v7(z)  :=0p.r-l(z)9r:^(z)• 

We  traverse  the  tree  in  post-order  (i.e.,  follow  the  order  labeled  on  each  node  of  the  tree),  and  evaluate 
the  rules. 

Now,  we  shall  consider  two  examples  in  detail. 


Example  4.  Pseudo-Inverse  of  pre  and  post  windowed  Toeplitz  Matrices. 
Consider  the  matrix  £3  in  Example  1,  where 


T^T  = 


It  is  desired  to  find  a  displacement  representation  of  (T^Ty^T^.  This  can  be  done  by  the  4-step 
recursive  generalized  Schur  algorithm.  The  input  to  the  algorithm  is  a  generator  {Xo(z),  To('v)l  of 


ri6 

8 

4 

.1 

3 

2 

1 

1 

-1 

0 

0 

0 

8 

16 

8 

pT  ^ 

0 

3 

2 

1 

1 

-1 

0 

0 

4 

8 

16 

8 

*  i 

0 

0 

3 

2 

1 

1 

-1 

0 

1 

4 

8 

16  j 

0 

0 

0 

3 

2 

1 

1 

-1 

£,= 


•pT  p  pT 

-/  0 


with  respea  to  {0^/,  0^*},  where  =Z„©Z„,  F’’  =Z„®Z„.  The  output,  1X4(2),  T4(>v)}  is  a 
generator  of  (T^Ty^T^,  with  respect  to  {0z^,  ©z;,}-  The  computational  sequence  is  illustrated  in 
Fig  1,  where  it  was  assumed  that  the  division  points  were  chosen  successively  by  2,  1  and  3. 


1  0 

z 

(!)•  /  0:0'-  0O;o(^)~ 

0  -1 

1 

because  Xo.(fz)  =  [4,  0] 

(2).  Xi-iCz)  =  Xo:i(z)0^/0O:o(z)  =  [4  +  2z ,  2z  ]  0^/ 0O;o(z )  =  [4z ,  -2z  ] 


2 

1  +1/2 

z 

(3).  /l:l 

ei:i(z)  =  -;|- 

-1/2  -1 

1 

z  ^  ~z/2 

(4).  Co;i:  0O;l(z)  =  0O:o(z)01:l(z)  =  I  J 
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(5).  ao:3:  -^2:3(2 )  =  -^0:3(2 )0F/Qo:i(2 )  =  ;|"[32^  +  32^/2.  -Z^/4] 


because  A’v-.Cz)  =  -=r-[3z^  0] 
V3 


(7).'  X3j(z)  =  X2:3(z)©f/e2:2(z)  =  z^M] 


1  0 

2 

(6).  f2,2:  ©2:2(2)  = 

0  -1 

1 

(8)-  /3:3:  03:3(2)  = 


12 


VT43 


1  1/12 

■ 

2 

-1/12  -1 

1 

(^)-  ^ 2:3-  02:3(2  )  =  02:2(2  )03:3(2  )  -  ' 

24 


2 

z/12 

2 

1/12 

1 

1 

(10).  CQ,y.  ©0:3(2)  =  0O:l(z)02:3(z)  = 


z'‘-z2/24  z^/12-z/12 

-z^in+zin  -z2/24+1 


(11).  a  0:?:  4:7(z  )  =  [4+2z  +2  Vz  ^/4-Z  ^/4,  2z  +2  ^+Z  ^’4-Z  ^/4]  ®  pf  ©0:3(z  ) 

=  [(4+22+2^+z^/4,  2z+z^+2^/4)  -  2‘*(1/4,  1/4)] 0^/ ©0.3(2) 

=  -2‘‘[(1/4.  1/4)©0:3(z)  mod  2**] 

=  -  [z/12-2^/24-2^/2,  1-z/2-z2/24+z^/12] 

Because  7^7  is  symmetric,  'Fo:3(m')  =  ©o.3(h')Z,  where  Z  =  1©-1,  and  therefore, 
)'4:13(h')  =  [(4+22 +z^+z^/4)+z '‘(3/4+2 /2+z2/4-2^/4), 

(2z  +2  2+Z  ^/4>l-Z  ^3/4+2  /2+Z  ^/4+2  ^/4-2  '‘/4)]  0^»  ©0:3(w  )Z 

=  -p?7^[l/4z +2^/24-32^/24492 ‘’/24+1  lz’/8+13z®/24+3z'^/2, 

-s-z/i+z^/s-iz^/d+nz^-imAz^-z^/B-zVn]. 


Therefore, 

(7^7)-‘7^  =  Y2[L(x,)L’'(y,)  +  Z.(X2)L^(y2)],  Y=  ;j=^. 

where  Z(x,)  and  Z(yj)  are  the  lower  triangular  Toeplitz  matrices  whose  first  columns  are  x,  and  y,  , 
respectively,  and 

xi  =  (0,  -1/12,  1/24,  l/2f , 

X2  =  [-1,  1/2,  1/24,  -1/12]^, 

yi  =  [0,  1/4,  1/24,  -3/2,  49/24,  11/8,  13/24,  3/2]^, 

yj  =  [-3,  -1/2,  1/8,  -2/3,  11/8,  -13/24,  -1/8,  1/12]^.  □ 
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Remark  1.  For  a  synunetric  generator  of  length  2  with  ^  =  1,  the  2  x  2  polynomial  matrix  &(z)  in 
(14)  can  have  the  form  (hypertwlic  reflection) 


e.(z)  = 


ch,z  shi 
-shi  z  -chi  ’ 


-  shi^  =  1. 


Let 


©p^(z)  s  0p(z)0^+i(z) 


0^(2)  = 


0i.i(z)  ®u(2) 
02,i(z)  02^(z) 


Then,  by  induction,  one  can  easily  prove  that 

r^-^^'ei.i(2-')  =  (-l)^-^-"‘e2^(2).  z-»-/’^'0,^(z-»)  =  (-l)^-^^^©2.i(r). 


Therefore,  we  need  to  compute  and  store  only  two  entries  of  ©^.^(z). 


Remark  2.  For  an  unwindowed  scalar  Toeplitz  matrix,  the  matrix  £2  (3)  has  a  displacement  rank  4, 

whereas  the  matrix  £3  has  a  displacement  rank  S.  Therefore,  it  is  more  efficient  to  And  a  displacement 
representation  of  (T^T)"^  rather  than  of  (T^T)~^T^  when  we  solve  Toeplitz  least  squares  problems. 
With  the  notation  in  (4),  the  matrix  £2  for  an  unwindowed  scalar  Toeplitz  matrix  T  =  (fi-j)  e  R'"’^ 
(m  ^  n)  has  a  generator  [13], 

W,  =  r^ti/lltill,  W2  =  t2,  W3  =  Z^Zjwi,  W4  =  Z,I, 

“  (^0»  •  ^m-l]  •  ^2  “  ^-l*  ‘  ’  *  ^  ~ 

Vi  =  V3  =  Ci/lltiH,  V2  =  V4  =  0, 

where  M  il  denotes  the  Euclidean  norm,  and  ei  is  the  vector  with  I  at  the  first  position,  and  zeros 
elsewhere. 


Example  5.  Displacement  Representation  for  the  inverse  of  a  Sylvester  Matrix. 


Let  T  denote  the  following  Sylvester  matrix. 


T  s 


2  0  0  1  0 
1  2  0  2  1 
3  12  12 
0  3  111 
0  0  3  0  1 


(17) 


and  suppose  that  it  is  desired  to  obtain  a  displacement  representation  of  T"'.  Then  the  appropriate 
extended  matrix  is 


£,= 


T  I 
-I  O  ’ 


(18) 


and  it  is  easy  to  see  that  the  following  lX(fz),  Y(fyv)}  is  a  generator  of  £1  with  respea  to 
{0/r/.  where  =Z5©Z5,  £*  =Z3©Z2©Z5; 


Xo(z)  ■  [x,(z),  X2(z),  X3(2)],  ToCw)  »  Lyi(H'),  y2(w).  y3(w)] 
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Xi(z)  =  2  +  z  +  3z^  ~  2^,  ^2(2)  =  1  +  2z  +  -  z*.  X3(z)  =  1.  (I9a) 

}’i(w')=l.  >'2(H')  =  >*’^  y-}(w)  =  w^  (19b) 

Now  the  5-step  recursive  generalized  Schur  algorithm  gives  a  desired  generator  of  with  respect  to 
{Z5,  Z5},  and  a  possible  computational  sequence  is  shown  in  Fig  2,  where  the  division  points  are 
chosen  successively  as  2,  1,  3  and  4. 


z  -1/2  -1/2' 

w  0  0 

(1)-  f  0:0-  ®0:0(2)  = 

0  1  0 

w/2  1  0 

0 

0 

1 _ 

.  w/2  0  1 . 

{2\  aQ.y.  Xi:,(z)  =  [2z.3z/2, -z/2]. 

yj:l(w)  =  [W 

p 

p 

Z 

-3/4  -1/4 1 

w 

0 

o' 

(3).  /i:,: 

9i;i(2)  = 

0 

1 

0 

= 

3h'/4 

1 

0 

.0 

0 

1 

-w/4 

0 

1_ 

'z^ 

-32/4-1/2 

z/4-1/2' 

0 

0 

(4)-  co;i: 

00:1(2  )  = 

0 

1 

0 

9 

= 

w 

^/2+3w/4 

1 

0 

.  0 

0 

1 

w^l2-wl4 

0 

1 

(5).  a 0:4:  X2a(2  )  =  [2z 4z ^-f-3z  ^  -5z h4-5z V4,  -5z 2/4+3z ^14] 
y2A(yv)  =  i'o:4(H')0;r*'I'o:l(w) 

=  [(li  0,  0)4'o;i(h')  mod  w^]  +  w^[(0,  w,  0)'Fo;i(>*')  mod  w^] 
=  [w^+3w^/4,  0] 


'z  5/8  5/8' 

w  0  0 

(6)-  /  2:2;  ^20(2  )  - 

0  1  0 
.0  0  1  . 

'i'2:2(^)  = 

-5w/8  1  0 
.-5w/8  0  1. 

(7).  02:4;  ^3:4(2)  =  [2z’-^^^  -5z^/8+15z‘^/8,  1  lz3/8-H5z^/8], 

=  l'2:4(H')©/r*^2:2(**') 

=  [(w^  0,  0)'F2;2('^)  mod  w^]  -1-  w^[(3w/4,  1,  0)'F2:2(w)  mod  w^] 
=  [-5w'‘/8.  w\  0] 


0 

0 

'-16w/5  1  o' 

(8)-  /  3:3;  ®3:3(2  )  = 

z  16/5  11/5 
.0  0  1  . 

,  'P3:3(>»')  = 

w  0  0 

.-llw/5  0  1. 

(9).  03:4;  2(4:4(z)=  [-Sz^/S.  7z^  6z^],  y4:4(>v)  =  -5w^/8.  0] 


z/(2>/2)  28/(5v'2)  6/5  ' 

w/(2^/2) 

5/(16V2)  0 

(10).  C4.4:  04.4(2)  — 

-5z/(16V2)  1/(2>'2)  -3/4 

4'4:4(1v)  = 

-28w/(5^/2) 

1/(2V2)  0 

0  0  1. 

-12vIw/5 

0  1 

- 
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After  evaluating,  c^a,  C2a  and  cq-^.  we  obtain  0o;4(2)  and  4'o;4(w).  and  finally 

(14).  00:9=  ^0:9(2)  =  Ul(z)-Jf2(z).JC3(z)]0f/®O:4(2) 

=  -Z^  O)0^/eo:4(2)l 

=  Z^[(-l, -Z^  0)00:4(2)  mod  Z*]  =  z*[«l(z),  U2(2),  U3(z)], 

where 

wi(z)  =  -zl{l<2yz\2<l^z'^l<i  +  z*l<l 

M2(2)  =  4/(5V2)  +  4z/V2  +  16z2/(5>^)  -  2Sz^l(5'l2)  -  28z‘*/(5V2) 

«3(z)  =  2/5  +  z/5  +  2z^/5  +  zVS  -  6z'‘/5. 

I'o:9(w)  =  [yi(w),  y2(w),  y3(>v)]®^*4'o:4(H') 

=  W^[(0,  0.  l)®yr*^0:4(w)]  =  W^[v,(w),  V2(w),  V3(w)], 

where 

vi(w)  =  -n^wJS  +  12w^/(5>i2)  +  \2w'^l{5<2)  -  \2'w*'l{5<2), 

V2(w)  =  -W/V2  +  w2/(2V2)  +  w^/(2V2)  -  w'’/(2>/2), 
vj(w)  =  1. 

Therefore, 

r-‘  =  L(U,)L^(Vi)  +  L(U2)L^(V2)  +  L(U3)L^(V3), 

where  u,-  and  v,  are  the  vectors  whose  yth  component  is  the  coefficient  of  z-'"*  and  w-'"’  of  u,  (z)  and 
Vi(w),  respectively. 

□ 

Remark  1.  If  we  had  chosen  the  displacement  operator  =  Z5©Z3©Z2,  F*  =  Zj®Z2®Zs  for  the 
matrix  T  in  (17)  we  would  have  the  same  generator  (19)  for  E^,  but  the  obtained  generator  of  T~^ 
would  be  the  one  with  respect  to  {Z3©Z2,  Z5)  rather  than  {Z5,  Z5}.  The  displacement  ranks  of  f’ 
with  respect  to  both  of  the  displacement  operators  are  2,  and  the  above  procedure  gives  non-minimal 
generators  of  length  3. 

Remark  2,  The  following  extended  matrix 

T  =  Sylvester  matrix  (20) 

also  has  a  displacement  rank  3.  One  could  as  well  obtain  the  solution  r“’b  directly  by  applying 
recursive  generalized  Schur  algorithm  to  (20);  the  last  column  of  X,  where  {X,  y}  is  the  computed 
generator  of  with  respea  to  {Z„ ,  1 ),  will  be  7~'b. 

4.  Polynomial  Products  with  Fast  Convolutions. 
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The  produa  of  two  polynomials  of  degree  dj  and  di  can  be  performed  efficiently  using 
d  s  di+dj+l  point  fast  cyclic  convolution  algorithms  [4].  If  d  is  a  power  of  two,  then  a  d -point  fast 
cyclic  convolution  needs  O(dlogd)  flops.  If  d  is  not  a  power  of  two,  but  a  highly  composite  number, 
then  the  number  of  computations  is  close  to  OidXogd).  Among  others,  fast  Fourier  transformations 
(FFT’s)  can  be  used  for  convolutions:  Ammar  and  Gragg  [2]  carefully  examined  the  use  of  FFT’s  for  a 
doubling  algorithm  for  square  Toeplitz  systems  of  equations.  We  shall  only  consider  the  subtle 
complications  that  arise  in  the  recursive  generalized  Schur  algorithm  in  this  paper. 

The  polynomial  matrix-matrix  product  of  (16)  needs  of  q-p  point  cyclic  convolutions.  The 
polynomial  vector-matrix  product  of  (15b)  has  a?  of  scalar  polynomial  products  of  the  form, 
Jt(z)©/r/0(2).  where  x(z)  is  a  polynomial  with  nonzero  terms  of  z^,  •  •  ,  z*^.  Let  us  assume  that 

0<6i<-<5,  Sp<  5,+i  <  •  •  <  5,  <  r  <  <  -<6,<q<  <  •  •  <  5^. 


Then 

x'(z)  sx(z)0^/0(z)  (21a) 

=  [z*'x/+i(z)  +  z*'*^‘x/+2(z)  +  ■  •  +  z*'Xj+i(z)  -t-  •  +  z^'x,+i(z)]0;r/0(z)  (21b) 

=  [2S^i(z)  +  •  •  +  2*'-'j:.(z)]®/r/8(z)  (22a) 

-(-z^'LJ:,+i(2)e(zP)  mod  z"'^*]  (22b) 

+  2®'“(x,+2(2)0(2^)  mod  z""']  (22c) 

+  2®'[^,+i(z)0(2^)  mod  z"'**]-  (22d) 


The  terms  in  (22a)  do  not  need  to  be  computed  because  these  terms  will  be  summed  to  zeros  after 
adding  all  the  partial  sums  in  the  vector-matrix  multiplication  of  (15b).  Recall  that  x,  (z)  has  degree  n,  , 
and  0(z^)  has  degree  Therefore,  the  product  x,(z)0(z^)  from  (22b)  to  (22d)  can  be  performed 

by 

2«,-^l  point  cyclic  convolutionst  if  degree[0(zP)]  >  degree[x,(z)]  , 

point  cyclic  convolutions  if  degree[0(z^)]  <  degree[x,(z)]  . 

Remark.  Notice  two  d/2  point  convolutions  take  cdlog(d/2)  flops  if  one  d  point  convolution  takes 
cdlogd  flops.  Therefore,  the  polynomial  product  (21)  is  more  efficient  for  the  displacement  operator 
with  more  sections,  because  such  displacement  operators  break  a  long  convolution  into  many 
smaller  convolutions.  Therefore,  for  a  given  matrix  we  prefer  to  choose  a  displacement  operator  with 
as  many  sections  as  possible,  while  keeping  the  displacement  rank  minimal. 

If  the  dimensions  of  the  matrix  are  powers  of  2,  then  we  can  always  choose  the  center  division 
point,  r  =  r(p-H7)/2l.  This  balanced  division  (or  doubling)  gives  the  least  number  of  computations,  in 
general.  For  this  case,  let  q  ap-^,  and  r(q)  denote  the  number  of  computations  for  one  recursion. 

t  The  lint  and  last  teimi  (22b)  and  (22d)  need  smaller  point  convolutions. 
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Then 


T(r\)  ^  2Tiy]/2)  +  W(r]),  W(t\)  =  O  (a^nlogn). 

and  therefore,  one  can  show  [1]  that  the  k-step  recursion  takes 
Tik)  ^  OiOL^klo^k). 

However,  in  most  cases  the  doubling  is  not  possible,  and  for  such  circumstances,  the  desirable 
choice  of  r  is  such  that  r-p  and  are  highly  composite  numbers  (so  that  fast  convolution 

algorithms  can  be  applied  efficiently),  as  well  as  r  is  close  to  (q-p)l2  (so  as  to  achieve  balancing), 

Matrix-Vector  Products  using  Displacement  Representation. 

The  final  step  of  finding  solutions  for  linear  equations  is  the  matrix-vector  multiplication  Sb, 
given  a  displacement  representation  of  5  €  , 

S  =  (23) 

i=l 

where  the  length  a  is  a  multiple  of  the  block  size  P;  a  =  ^5,  and 

Ff  =  ®zi,  F‘>  =  ®ZI 

C»1  «=1 

The  expression  in  (23)  can  be  re-written  as  the  block  displacement  representation 
5 

S  =  liiTpGC,.  F^)Kl(ri,  F^),  Xi  e  Y,  €  (24) 

«-i 

where 

K^{Xi,  F^ )  =  [X,,  FXi,  Ff%,  •  • 

=  [Yi,  FYi,  F^'^Yi,  •  • 

Furthennore,  because  F^  and  F*  have  M  and  N  sections,  respectively,  we  can  write 


'h(Xt,i,Zl)  o' 

■Lp(l',,,Z,P)  o' 

Xp(X.  .  F^)  = 

hiX2j,zl)  0 

.  Xp(ri,F*)  = 

h{XM,,z;S^)  0 

h(.YN,i,zl)  0 

where  Lp(X,  Z^)  is  the  block  lower  triangular  Toeplitz  matrix  with  the  first  column  block  X.  The 
matrix  O  denotes  the  null  matrix  with  appropriate  size  such  that  Kp(Xi,F-^)  and  K^(Yi,F^)  are 
m  X  n  and  n  x  n  matrices,  respectively.  The  product  f.p(X,  Z^b  can  be  expressed  as  sum  of  p 
products  of  scalar  lower  triangular  Toeplitz  matrix  and  vectors.  As  an  example. 


15 


(25) 


Co 

bo 

ao 

bo 

Co 

'bi 

'll 

Cl 

bi 

a\ 

ao 

0 

Cl 

Co 

0 

Cl  CLo  Cq 

bi 

U2 

a\ 

ao 

bi 

+ 

C2 

Cl  Co 

63 

aj 

C3  ai  Cl 

63 

<23 

02 

Ui  Oq 

0 

C3 

C2  Cl  Cq 

0 

Now  the  multiplications  in  the  right  sides  of  (25)  can  be  done  by  fast  convolutions,  and  therefore,  so 
does  the  multiplication  S  b. 

5.  Concluding  Remarks. 

We  have  presented  0(,a?nlo^n)  algorithms  for  the  deteraiination  of  exact  and  least  squares 
solutions  of  linear  systems  with  matrices  having  (generalized)  displacement  rank  a.  Such  algorithms 
for  exact  solutions  have  been  studied  by  several  authors,  most  recently  by  Ammar  and  Gragg  [2]  for 
Toeplitz  systems.  They  also  made  a  very  close  study  of  the  implementation  of  the  convolution 
operation  in  an  attempt  to  obtain  the  smallest  coefficient;  we  have  not  attempted  so  close  an  analysis 
for  the  more  general  algorithm  in  our  paper.  Nor  have  we  attempted  a  numerical  error  analysis  of  the 
algorithm;  nevertiieless  one  might  hope  that  numerical  refinements  devised  for  the  Schur  algorithm  (see 
e.g.,  Koltracht  and  Lancaster  [16])  may  be  carried  over  to  the  divide-and-conquer  framework  as  well. 
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APPENDIX 


We  shall  summarize  the  explanation  in  Sec  3  using  a  recursive  procedure.  First,  note  that  the 
polynomial  0p;,(z)  (and  has  q-p+2  terms.  The  first  column  of  Qp  -.g(z)  has  terms  ranging 

from  degree  z  to  and  the  other  columns  have  terms  from  1  to  z‘’~^ .  Hence,  by  shifting  the  first 

column  by  one  position,  we  can  store  0p;^(z)  and  'Fp;,(z)  in  the  array  "Poly"  from  p  to  q  slots 
inclusive; 

Poly;  array  [l..a,  l..a,  0..MAX-1]  of  record 
6:  coefficients; 
y;  coefficients 

end; 

The  computation  of  0p;,,(z)  is  sequential,  i.e.,  once  we  compute  0p.^(z),  we  do  not  need  to  keep 
®p  .r-i(z).  and  therefore,  the  array  "Poly"  can  be  kept  as  a  single  global  variable. 

The  polynomial  vector  Xp.^(z)  has  q-p+l  terms,  and  therefore,  can  be  stored  in  an  array  type 
GENERATORS; 

type 

GENERATORS  =  array  [l..a,  0..MAX-1]  of  record 
x:  coefficient; 
y:  coefficient 

end; 

However,  Xp._^(z)  cannot  be  kept  as  a  global  variable,  and  local  copies  should  be  maintained  during 
each  recursive  call. 

Now  we  can  describe  the  recursive  generalized  Schur  algorithm  as  follows. 

Algorithm  (Recursive  k-step  Generalized  Schur  Algorithm). 

Input:  Generator  of  E,  {Xo(z),  Ko(w)};  displacement  operator  {0^./,  0p.»);  Number  of  steps,  k. 
Output:  Generator  of  S,  {Xt(z),  Tk(H')}; 
procedure  RecursiveSchur 
var 

G,  LowerG:  GENERATORS; 
begin 

Find(0,  k-1,  G); 

Apply(0,  k,  n,  G,  LoweiG); 
return  (LowerG) 
end 

The  procedure  Find(p,  q,  G)  computes  0p;,(2),  and  given  {Xp.^(z),  I'p.,(w)j,  and  the 

procedure  Apply(p,  r,  q,  G,  LoweiG)  remms  LowerG  =  [X^-giz),  Tr:?(w)}  given  G  = 
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procedure  Find(p,  q:  index;  G:  GENERATORS); 
var 

r :  index; 

G,  LoweiG:  GENERATORS; 
begin 

if  p  =  q  then  begin 

Compute  ep.piz)  and  'Vp.piwy, 

return 

end 

r  ;=  appropriate  integer  close  to  {(p+q)J‘^ ; 

Find(p.  r-1,  G); 

Apply(p.  r,  q.  G,  LoweiG); 

Find(r,  q,  LoweiG); 

(*  Use  fast  convolution  for  polynomial  products  *) 

©p:,(z)  :=  ep:,_i(z)0,.,(z); 

end 

procedure  Apply(p,  r.  q;  index;  G:  GENERATORS;  var  LoweiG:  GENERATORS); 
begin 

(*  Use  fast  convolution  for  polynomial  products  *) 

:q  (z  )  •=  Xp  (z  )  0 p/  0^  ); 

LoweiG  :=  {X^..^(2).  y,:,(w)} 
return  (LoweiG); 
end 
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Fig  1.  Sequence  of  Computations  for  Example  4. 
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Fig  2.  Sequence  of  Computations  for  Example  5. 
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ABSTRACT.  The  free  formulation  of  Bergan  and  Nygird  (1984)  has  enjoyed 
considerable  success  in  the  construction  of  high-performance  finite  elements  for  linear  and 
nonlinear  structural  analysis.  In  its  original  form  the  formulation  combines  nonconforming 
internal  displacement  asstimptions  with  a  specialized  version  of  the  patch  test.  Recent 
developments  in  fitting  this  formulation  within  a  variational  framework  axe  described,  and 
extensions  opened  up  by  these  developments  discussed. 

INTRODUCTION.  The  term  high-performance  finite  element  is  used  here  to 
collectively  identify  elements  that  are  developed  to  attain  the  following  goal: 


To  deliver  engineering  accuracy  with  coarse, 
arbitrary  meshes  of  simple  elements 


The  fulfillment  of  this  goal  gives  rise  to  a  myriad  of  requirements,  which  are  to  be 
addressed  in  higher  or  lesser  degree  during  element  development.  Such  requirements  are 
listed  in  Table  1. 

Some  of  these  requirements  are  obvious.  For  example,  low  distortion  sensitivity  is  an 
immediate  consequence  of  trying  to  achieve  satisfactory  accuracy  with  arbitrary  meshes. 
But  other  items  in  Table  1  require  some  explanation. 

A  key  requirement  is  that  the  element  be  as  simple  as  possible.  It  should  be  observed 
that  this  is  in  sharp  contrast  to  trends  of  the  late  1960s  and  1970s  that  lauded  higher 
order  elements  and  culminated  with  the  development  of  very  complex  models,  including 
elements  with  nonphysical  degrees  of  freedom.  One  primary  source  of  this  “backlash”  is 
feedback  from  users  of  general-purpose  finite  element  programs.  As  use  of  these  programs 
expands  to  more  engineers  without  deep  knowledge  of  “what’s  inside  the  black  box”  the 
overwhelming  preference  in  model  construction  is  to  select  the  “simplest  elements  that  will 
do  the  job”  that  is  available  in  the  program. 
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Table  1.  Target  Requirements  for  High-Performance  Elements 


•  Simple;  few  freedoms,  all  physical 

•  Frame  invariant 

•  No  locking 

•  Rank  sufficient:  no  spurious  modes 

•  Balanced  stiffness:  not  too  rigid,  not  too  flexible 

•  Stresses  as  accurate  as  displacements 

•  Low  distortion  sensitivity 

•  Mixable  with  other  elements 

•  Economical  to  form 

•  Easily  extendible  to  nonlinear  analysis 


The  balanced  stiffness  demand  also  deserves  some  comment.  It  follows  from  the  goal 
of  attaining  reasonable  accuracy  with  coarse  meshes.  This  is  illustrated  in  Figure  1,  which 
shows  a  convergence  study  of  a  classical  model  problem:  the  bending  of  a  simply-supported 
square  plate  under  a  concentrated  central  load.  The  mesh  contains  N  x  N  elements 
over  a  plate  quadrant.  An  “accuracy  band”  of  ±1%  is  taken,  somewhat  arbitrarily,  as 
representative  of  engineering  accuracy  for  this  rather  simple  problem.  The  convergence 
characteristics  of  several  tri2Lngular  elements  are  taken  from  the  extensive  study  of  Batoz, 
Bathe  zmd  Ho  (1980).  Although  most  elements  converge,  some  are  too  stiff  while  others  are 
too  flexible,  and  generally  do  not  enter  the  accuracy  band  until  the  mesh  is  fairly  refined 
{N  >8).  On  the  other  hand,  the  results  labelled  ‘FF’,  obtained  with  a  plate  element  based 
on  the  free  formulation  (FF)  discussed  later,  lie  within  the  band  for  all  meshes. 

The  balanced-stiffness  requirement  should  not  be  misconstrued  for  fast  asymptotic 
convergence  for  fine  meshes.  Simple  elements  cannot  compete  with  higher-order  elements 
in  this  regard.  What  is  important  is  how  good  are  the  results  for  coarse  meshes. 

THEME  AND  TOOLS.  Many  researchers  are  presenly  working  to  develop  such 
elements.  The  common  theme  of  the  investigations  is 


Abandon  the  conventional  displacement  formulation 


Figure  1 .  Convergence  stndy  of  several  plate  bending  finite  elements  as  reported  in 
Batos  ct.al.  (1980).  The  FF  results  aure  from  Felippa  and  Bergan  (1987). 

Varioiis  tools  used  by  these  researchers  in  their  quest  for  high-performance  elements 
axe  listed  in  Table  2.  It  can  be  observed  that  many  of  these  were  introduced  over  20  years 
ago.  But  it  is  only  now  that  a  concerted  effort  is  made  to  combine  several  tools  to  forge 
superior  products. 

The  present  paper  focuses  on  one  of  the  possible  approaches  to  the  construction  of 
high-performance  elements.  This  approach  is  based  on  the  free  formulation  (FF). 
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Table  2.  Tools  of  the  Trade 


Technique 

Year  introduced 

• 

Incompatible  shape  functions 

1964 

• 

Patch  test 

1965 

• 

Mixed  and  hybrid  variational  principles 

1965 

• 

Projectors 

1967 

• 

Reduced  and  selective  integration 

1969 

• 

Assumed  strains 

1970 

• 

Energy  balancing 

1974 

• 

Limit  differential  equations 

1982 

THE  FREE  FORMULATION.  In  the  early  1980s  Bergzm  and  Nygird  developed 
the  free  formulation  (FF)  for  the  construction  of  displacement-based,  incompatible  finite 
elements.  This  work,  published  in  Bergan  and  Nyg&rd  (1984),  consolidated  a  decade  of 
research  of  Bergan  and  coworkers  at  Trondheim,  milestones  of  which  may  be  found  in 
Bergan  and  Hanssen  (1976),  Hanssen  et.al.  (1979)  and  Bergan  (1980).  The  products  of 
this  resezLTch  have  been  finite  elements  of  high  performance,  especially  for  plates  and  shells. 
Linear  applications  are  reported  in  the  aforementioned  papers  as  well  as  in  Bergan  and 
Wang  (1984),  Bergan  and  Felippa  (1985)  and  Felippa  md  Bergan  (1987);  whereas  nonlinear 
applications  are  presented  in  Bergan  and  Nygird  (1985,  1988)  and  Nygird  (1986). 

The  basic  concept  is  that  the  element  stiffness  matrix  can  be  decomposed  into  two 
parts: 

(1)  K  =  Kfr  +  Kfc 

where 


Kfc  the  basic  stiffness  matrix,  which  is  constructed  for  eonvergenct. 

K/i  the  higher-order  stiffness  matrix,  which  is  constructed  for  stability  and  accuracy. 

The  decomposition  (1)  may  be  interpreted  at  the  assembled  or  master-stiffness  equa¬ 
tion  level  as  the  force  decomposition 

(2)  K^  =  (Ki'+Kj;)v  =  f^+C=f 

where  v  and  are  the  vectors  of  nodal  displacements  and  assembled  nodal  forces,  respec¬ 
tively.  A  FF  postulate  is  that  as  the  mesh  size  decreases  and  the  solution  converges,  K^v 
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dominates. 

The  original  FF  was  based  on  nonconforming  displacement  assumptions,  the  principle 
of  virtual  work  and  a  specialized  form  of  Irons’  patch  test  that  Bergan  and  Hanssen  (1976) 
called  the  individual  element  test.  The  basic  and  higher  order  stiffness  are  constructed  in 
Izngely  independent  fashion  by  following  the  procedures  outlined  below. 

CONSTRUCTION  OF  BASIC  STIFFNESS  MATRIX.  The  main  steps  are 
outlined  below  in  "recipe”  form;  for  justification  the  reader  is  referred  to  the  references 
listed  above. 


Step  1.  Assume  a  constant  stress,  er,  inside  the  element. 

Step  2.  Assume  boundary  displacements,  d,  over  the  element  boundary  B. 
This  field  is  described  in  terms  of  element  node  displacements  v  as 

(3)  d  =  Vv 

where  V  is  an  array  of  boundary  shape  functions.  The  boundary  motion  must 
satisfy  interelement  continuity,  and  contain  rigid>body  and  constant-strain  mo¬ 
tions  exactly. 

Step  S.  Construct  the  “lumping”  matrix 

(4)  L  =  j  V.n  dB 

that  consistently  "lumps”  the  boundary  tractions  a.n  associated  with  <r,  into 
element  node  forces,  f,  conjugate  to  v.  That  is,  f  =  ha. 

Step  4-  The  basic  stiffness  matrix  is 

(5)  K6  = 

t; 

where  E  is  the  stress-strain  constitutive  matrix,  assumed  to  be  constant  over  the 
element,  and  v  =  fy  dV  denotes  the  element  volume. 


CONSTRUCTION  OF  HIGHER-ORDER  STIFFNESS  MATRIX.  Again 
the  key  steps  are  outlined  below  in  “how  to  do  it”  form. 
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Sttp  1.  The  S2mae  compatible  boundary  displacements  used  in  constructing 
are  assumed: 

(6)  d  =  Vv 

Step  2.  Assume  an  internal  displacement  field  over  the  element  volume  V : 

u  =  Nq  =  +  N^q^  +  N^q;, 

rigid  motion  constant-strain  higher-order 

where  array  N  collects  shape  functions  and  q  collects  generalized  coordinates. 
This  assumption  satisfies  the  following  conditions: 

(a)  linear  independence  with  respect  to  v, 

(b)  the  dimension  of  vectors  q  and  v  are  the  same, 

(c)  the  rigid  motions  and  constant-strain  fields  are  complete, 

(d)  (optional  but  recommended)  the  higher-order  displacements  are  energy  or¬ 
thogonal  to  the  constant-strain  displacements. 

The  associated  internal  strains  are: 

(8)  e  =  Bu  =  ec  +  Cfe  =  B^q^  +  B^q;» 

since  the  rigid-body  strains,  Brq^,  must  vanish. 

Step  S.  Construct  the  square  nonsingular  transformation 

(9)  v  =  Gq 
which  inverted  gives: 

[H/ 

(10)  q=^q^^=Hv=  v 

Step  4-  The  higher-order  stiffness  matrix  is 

(11)  where  =  /  B^CB;, 

Jv 

Kqh  is  the  generalized  stiffness  in  terms  of  the  q  coordinates. 
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Table  3.  Elements  Developed  with  FF 


Type 

Shape 

Dofs 

Kirchhoff  plates 

Triangles  (several) 

9 

Quadrilaterals 

12 

Membrane  with  drilling  freedoms 

Triangle 

9 

Quadrilateral 

12 

Shells 

Triangle 

18 

Quadrilateral 
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SCALING  THE  HIGHER-ORDER  STIFFNESS.  In  more  recent  work  (see 
Bergan  and  Felippa  (1985)  and  following  papers)  the  concept  of  scaling  the  higher  order 
stiffness  was  introduced.  A  one  parameter  scaling  generalizes  (l)  to 

(12)  K=Ki,  +  (l-7)KA 

where  7  <  1  is  a  scalar.  If  7  =  0  one  recovers  (1),  but  higher  accuracy  for  coarse  meshes 
may  be  obtained  by  adjusting  the  value  of  7.  (This  value  may  vary  from  element  to 
element.)  Multipzirameter  scaling  is  discussed  in  Felippa  and  Bergan  (1987)  for  a  specific 
plate  bending  element. 

APPLICATIONS.  Table  3  lists  elements  that  have  been  developed  using  the 
FF  as  of  this  writing.  The  major  code  in  which  these  elements  have  been  implemented  is 
FENRIS,  developed  in  collaboration  between  the  Norwegian  Institute  of  Technology  (NIT) 
at  Trondheim,  SINTEC  and  Der  Norske  Veritas;  see  Bergan  et.  al.  (1984).  FENRIS  has 
been  primarily  iised  in  the  analysis  of  nonlinear  marine  structures  such  as  offshore  drilling 
platforms.  Table  4  lists  the  major  application  problems  to  which  these  elements  have  been 
applied. 


Table  4.  Application  Problems 


•  Linear  plate/shell  analysis 

•  Geometrically  nonlinear  plate/shell  analysis 
(corotational  formulation) 

•  Materially  nonlinear  plates  and  shells 
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VARIATIONAL  FORMULATION.  An  intriguing  question  has  been:  does  the 
FF  fit  in  a  variational  framework?  This  was  partly  answered  by  Bergan  and  Felippa  (1985), 
who  showed  that  the  basic  stiffness  part  was  equivalent  to  a  constant-stress  hybrid  element. 
But  persistent  efforts  by  the  author  to  encompass  the  higher  order  stiffness  within  a  hybrid 
variational  principle  were  imsuccessful  until  the  development  of  parzunetrized  mixed-hybrid 
functionals  in  Felippa  (1988a,  1988b).  With  the  help  of  these  more  general  functionals  it 
is  possible  to  show  that  the  FF  is  a  very  special  type  of  mixed-hybrid  element  which  does 
not  fit  within  the  classical  Hellinger-Reissner  principle.  In  retrospect  the  classification 
of  FF  elements  as  hybrids  is  not  surprising.  Under  mild  conditions  studied  in  Felippa 
(1988c),  hybrid  elements  satisfy  Irons’  patch  test  a  priori,  and  the  FF  development  has 
been  founded  on  that  premise. 

To  encompass  the  FF  within  the  hybrid  framework,  the  following  assumptions  must 
be  invoked. 


Assumptiori  1.  A  non-standard  hybrid  functional,  identified  as  IT^  in  Felippa 
(1988b),  is  constructed.  This  functional  depends  linearly  on  a  parameter  7.  This 
parameter  “interpolates”  between  the  minimum  potential  energy  functional  and 
the  Hellinger-Reissner  functional,  which  are  obtained  for  7  =  0  and  7=1, 
respectively. 

Assumption  2.  Three  fields  are  assumed  over  each  element: 

(a)  a  constant  stress  field, 

(b)  an  internal  displacement  field  u  defined  by  n,  generalized  coordinates  col¬ 
lected  in  vector  q,  and 

(c)  a  boundary  displacement  field  d  defined  by  n„  nodal  displacements  collected 
in  vector  v.  Both  d  and  u  must  represent  rigid  body  motions  and  constant 
strain  states  exactly. 

Assumption  3.  The  number  of  generalized  coordinates,  n^,  equals  the  number 
of  nodal  displ2M:ements,  n„,  and  the  square  trzinsformation  matrix  G  relating 
V  =  Gq  is  nonsingular. 


The  last  two  assumptions  are  precisely  those  invoked  in  the  construction  of  K/j  as 
discussed  previously.  The  first  one  defines  the  variational  principle  and  accounts  for  the 
higher-order  stiffness  scaling. 

In  Felippa  (1988b)  it  is  shown  that  substituting  the  finite  element  expansions  into  11^, 
rendering  the  functional  stationary  with  respect  to  the  degrees  of  freedom,  and  eliminating 
both  internal  fields  by  a  combination  of  static  condensation  and  kinematic  constraints, 
leads  to  the  scaled  FF  stiffness  equations  (12)  in  terms  of  the  nodal  displacements  v.  The 
parameter  7  appears  as  a  coefficient  of  the  higher  order  stiffness.  These  stiffness  equations 
can  be  readily  implemented  into  any  displacement-based  finite  element  code. 
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CONCLUDING  REMARKS.  Why  is  the  FF  variational  formulation  deemed 

useful?  There  are  several  reasons: 

1.  It  explains  the  behavior  of  FF  elements  as  regards  convergence,  stability  and  accuracy. 

2.  It  opens  up  the  door  to  extensions  that  zure  not  obvioiis  from  a  physical  standpoint. 
Two  such  extensions  involve:  retaining  higher  order  stress  fields,  auid  allowing  more 
internal  displacement  modes  that  nodal  displacements,  that  is,  the  dimension  of  vector 
q  in  (7)  exceeds  that  of  v  in  (6).  These  extensions  are  studied  in  Felippa  (1988c). 

3.  Supplies  foimdations  for  local  error  estimation  and  adaptive  mesh  refinement. 

4.  Facilitates  the  construction  of  “designer  elements”  needed  for  applications  such  as 
stress,  stability  and  vibrations  of  advanced  laminate-composite  structmres.  Such  ele¬ 
ments  may  combine  the  three  ingredients  of  internal  statics,  internal  kinematics  and 
boundary  kinematics  in  harmonious  synergy  to  satisfy  special  behavior  requirements. 
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Abstract 

Reversible  plasticity  is  modeled  with  a  stress  strain  law  having  a  distinct  yield  point. 
Expressions  are  derived  for  the  element  tangent  stillness  matrix  and  load  vector  of  a  largely 
deformed  cubic-cubic  shell  element  by  discrete  integration.  Numerical  experiments  are  car¬ 
ried  out  using  the  orthogonal  trajectory  technique  to  trace  sequences  of  equilibrium  config¬ 
urations  for  high  loads. 


1.  Introduction 


Very  thin  shells  share  a  numerical  troublesome  property  with  nearly  incompressible  elas¬ 
ticity  [1,2]  that  their  stiffness  matrix  seriously  decline  in  condition  as  the  thickness  is  reduced. 
Enforcement  of  the  emerging  condition  of  C*  continuity  (incompressibility  in  elastomers)  cre¬ 
ates  an  imbalanced  elastic  energy  expression  consisting  of  two  out  of  proportion  terms.  The 
corresponding  stiffness  matrix  consists  in  such  cases  of  the  linear  combination  of  two  matrices 
with  widely  separated  coefficients,  and  consequently  with  an  eigenvalue  spectrum  consisting 
of  two  groups  largely  shifted.  The  lowest  eigenvalue  of  the  global  stiffness  matrix  is  related 
to  the  fundamental  frequency  of  the  elastic  system  and  is  therefore  only  slightly  affected  by 
the  large  parameter,  but  the  largest  eigenvalue  of  the  global  stiffness  matrix  grows  without 
bound  causing  the  decline  in  conditioning. 

Direct  solution  methods  for  the  linear,  or  linearized,  stiffness  equation  are  operationally 
unaffected  by  ill  condition  but  the  convergence  of  the  Newton-Raphson  method,  the  fun¬ 
damental  solution  procedure  for  all  nonlinear  finite  element  equilibrium  problems,  is  mea¬ 
surably  influenced  by  out  of  balance  elastic  energy  expressions.  Also,  the  mere  storage  of 
such  ill  conditioned  algebraic  systems  is  an  immediate  cause  for  round-off  errors  and  loss  of 
numerical  significance. 

Iterative  methods  for  the  solution  of  the  linearized  stiffness  equation  such  as  conjugate 
gradients  are  most  attractive  for  the  solution  of  the  very  large  discrete  systems  set  up  by 
finite  elements,  but  unlike  direct  methods  they  do  show  great  operational  sensitivity  to  the 
spectrum  span  of  the  global  stiffness  matrix  and  may  very  well  lose  all  convergence  properties 
in  the  presence  of  ill  conditioning,  rendering  them  useless. 

A  possible  way  out  of  the  numerical  instability  of  the  imbalanced  energy  of  nearly  in¬ 
compressible  elasticity  and  thin  shells  is  to  use  a  multiparameter  technique  whereby  a  well 
conditioned  system  is  set  up  for  a  shell  of  not  excessive  thickness  and  is  comfortably  and 
accurately  solved  to  provide  an  initial  guess  for  a  next,  less  well  conditioned  system.  Ex¬ 
trapolation  to  the  limit  over  the  disturbing  parameter  may  be  carried  out  to  accomplish  the 
computations  to  the  needed  accuracy.  This  technique  is  important  but  we  shall  not  deal 
with  it  in  the  present  paper.  It  deserves  a  separate  discussion. 

The  issue  of  plasticity  is  more  central  to  our  present  discussion.  Irreversible  plasticity 
is  extremely  expensive  and  computationally  complicated  in  being  dissipating  and  incremen¬ 
ted.  Such  plastic  formulation  should  be  reserved  in  our  opinion  only  to  such  elasto-plastic 
problems  where  a  clear  phenomenological  merit  is  established  for  irreversibility.  Otherwise, 
reversible  plasticity,  namely  an  analytic  nonlinear  constitutive  law  with  a  distinct  yield  point 
[3],  and  drastically  reduced  (even  to  zero)  elastic  modulus,  should  be  amply  adequate  and 
sufficiently  revealing,  both  physically  and  mathematically. 

The  paper  is  devoted  to  the  incorporation  of  these  three  computational  elements  to  cre¬ 
ate  a  computational  procedure  [4]  for  the  large  elasto-plastic  deformation  of  thin  shells  of 
revolution:  (1)  The  creation  by  discrete  numerical  integration  of  a  cubic-cubic  element  stiff¬ 
ness  matrix  and  load  vector  for  a  largely  displaced  and  highly  strained  shell  of  revolution. 
(2)  The  incorporation  into  the  shell  element  program  of  reversible  plasticity.  (3)  The  adap- 
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tation  of  the  orthogonal  trajectory  technique  to  trace  load  displacement  branches;  and  (4) 
The  performance  of  numerical  computation  to  validate  the  theory  and  show  its  practicality. 

2.  Load-displacement  tracking 

It  is  an  integral  part  of  any  nonlinear  finite  element  program.  Newton- Raphson  tech¬ 
niques  are  the  most  widely  used  solution  procedures  for  the  nonlinear  stiffness  equation. 
They  are  fast  and  usually  reliable  as  long  as  the  sought  equilibrium  configuration  is  far  &om 
being  a  critical,  or  turning  point.  To  account  for  such  singularities  we  need  modify  the 
Newton- Raphson  iterations  to  include  variation  of  both  displacement  and  load. 

To  fix  ideas  we  shall  first  present  the  orthogonal  trajectory  method  for  nonlinear  contin¬ 
uation  for  the  single,  implicit  equilibrium  curve 

r(x,A)  =  0  (1) 

in  which  x  is  displacement  and  A  load.  To  trace  A  vs.  x  we  need  to  compute  close  pairs  x,  A 
that  satisfy  eq.  (1). 

Tracing  the  a:,  A  curve  consists  of  the  two  distinct  stages  of  prediction  and  correction. 
Predictor  is  the  stage  in  which  we  move  from  a  previously  established  equilibrium  point  A 
to  a  new  guess  at  point  B,  usucdly  a  distance  s  on  the  tangent  line  at  A  as  in  Fig.  1(a). 

If  A(zO)  Ao)  is  an  equilibrium  point  and  S(ii,Ai)  is  on  the  tangent  so  that  AB  =  j,  then 
linearization  of  r(zo  +  ^*iAo  +  6x)  =  0,  Sx  =  xi  —  xq,  ^A  =  Ai  —  Ao  produces 

ro  -f  rg^z  -r  roSX  =  0,  Sx^  +  SX^  =  (2) 

where  r'g  is  drjdx  at  A  and  fg  is  drjdX  at  A.  Hence 

sr*  sr 

±  ^ =  *0  ±  j-.,  (3) 

where  the  choice  of  sign  determines  the  direction,  and  where  the  subscript  zero  is  omitted 
for  typographical  brevity. 

In  vector  form  we  write  the  total  potential  energy  as  Tr(z,A)  and  have  the  system  of 
stiffness  equations  as  z(z,A)  =  97r/5z,  the  gradient  of  r.  Linearization  is  here  of  the  form 

'^](i-x„)  +  (A-Ao)|^=0  (4) 

where  K  =  drjdx  is  the  global  stiffness  matrix,  and  where  p  =  drjdX  is  the  load  vector. 
Equation  (4)  is  a  vector  equation.  With  the  constraint  on  the  traveled  distance 

(zi  -  zo)^(zi  -  zo) -I- (A  -  Ao)^  =  3^  (5) 

linearization  leads  to 

Ai  =  Ag  a(l  -t-  pqKq^Kq^pq)~^,  Z]  =  zg  -  SXqKq^po  (6) 
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that  completes  the  prediction. 

Starting  with  the  predicted  initial  guess  (xo,Ao)  we  set  out  to  approach  the  equilibrium 
curve.  This  can  be  done  with  a  Newton- Raphson  iterative  method  in  which:  (1)  The  load  A 
is  constant  as  in  Fig.  1(b).  The  failing  of  such  iterative  scheme  near  a  limit  point  is  clear. 

(2)  The  load  A  is  linearly  related  to  the  displacement  x,  as  in  Fig.  1(c).  In  case  the  linear 
constraint  misses  the  equilibrium  curve  convergence  is  not  achieved  by  this  technique  neither. 

(3)  The  load  A  is  related  to  the  displacement  *  by  the  condition  that  the  the  iterated  points 
lie  on  a  circle  of  radius  s  having  its  center  at  the  previous  equilibrium  point,  as  in  Fig.  1(d). 

(4)  The  load  A  and  displacement  x  are  constrained  to  be  an  orthogonal  trajectory  to  the 
equilibrium  curve,  as  in  Fig.  2. 

In  the  orthogonal  trajectory  accession  technique  [5]  the  ultimate  correction  step  is  orthog¬ 
onal  to  the  equilibrium  curve.  Load  and  displacement  enter  symmetrically  in  this  algorithm 
which  is  therefore  indifferent  to  critical  points  and  turning  points. 

Analytically  we  add  to  the  linearized  equation  r  -f-  r'Sx  -j-  r  SA  =  0  the  orthogonality 
condition 

dA  =  r{r')~^dx  (7) 

and  obtain  the  corrections 


^A  =  - 


rr 


r'^  -f 


7T,  = 


rr 


r'*  4-  r^ 


(8) 


in  which  the  right  hand  sides  include  computed  values  only,  and  where  ^A  =  Aj  —  A©  and 
6x  =  —  xq. 

In  the  multidimensional  case  x  stands  for  the  displacement  vector  so  that  r(x,A)  =  0  is 
a  set  of  nonlinear  stiffness  equations.  Here  /  =  iT  is  the  global  tangent,  displacement  de¬ 
pendent,  global  stiffness  matrix,  and  f  =  p  is  the  global  nonlinear  load  vector.  Linearization 
is  here  of  the  form  r  K6x  pSA  =  0,  or  r  -h  Kdx  -|-  pdA  =  0,  since  we  are  dealing  with  dif¬ 
ferentials  rather  than  differences,  and  the  orthogonality  conditions  assume  the  matrix  vector 
form 

dA  =  r{r')~^dx  =  p^  K~^dx 


dx  =  —K  ^{r  +  dAp) 


(9) 


producing  finally  the  corrections 


dA  = 


p'^K-'^K-^r 

l^pTK-^K-^p' 


dx  =  —K  ^(r  -I-  dAp) 


(10) 


Figures  3  shows  the  orthogonal  accession  algorithm  applied  to  the  equilibrium  equation 
r  =  8i(l  —  x)  —  A  =  0  with  a  variety  of  starting  points  of  the  form  (0,  A).  Never  does  the 
algorithm  fail,  and  it  shows  a  penchant  to  hit  the  limit  point  of  the  curve.  The  limit  point 
is  a  special  attractor  for  the  method.  Notice  that  if  the  initial  guess  lies  on  a  normal  to  the 
equilibrium  point,  then  convergence  is  in  one  step.  Otherwise,  it  is  not  easy  to  find  an  initial 
guess  requiring  more  than  six  steps  to  convergence. 
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Nonlinear  elastic  equilibrium  problems  are  replete  with  critical  points  and  a  modified 
Newton-Raphson  method  must  be  used  on  these  problems  to  successfully  pass  limit  and 
turning  points.  All  the  present  computations  were  done  with  this  algorithm. 


3.  Reversible  plasticity 


All  we  need  to  model  reversible  plasticity  is  a  stress-strain  law  that  exhibits  a  bilinear 
behavior  with  a  clearly  defined  yield  point.  It  should  also  include  a  parameter  to  adjust  the 
slope  in  the  plastic  range.  We  suggest  to  relate  the  stress  <r  to  the  strain  e  by 


tr  = 


ae 


(11) 


in  which  a,/3  and  p  are  parameters.  To  obtain  the  modulus  of  elasticity  for  the  law  in  eq. 
(11)  we  differentiate  <r  with  respect  to  e,  dajde  =  E.  Figures  4  and  5  show  er  =  er(e)  and 
<r'  =  er'(e)  as  a  function  of  the  parameter  p.  As  p  increases  law  (11)  approaches  describing 
perfect  plasticity.  Notice  in  Fig.  5  the  existence  of  an  accurate  yield  point  at  0-//3  =  1. 


4.  Cubic-Cubic  shell  of  revolution  element 

As  for  the  plastica,  also  here  [6,7]  we  may  write  the  stiffness  matrix  as  for  the  elastic 
shell,  except  that  we  have  to  bear  in  mind  that  the  elastic  modulus  E  is  strain  dependent. 

Let  r  =  t{i])  and  z  =  2(77)  be  the  parametric  equations  of  the  generating  curve  for  the 
deformed  sell.  Referred  to  the  Cartesian  coordinate  system  oxyz 

X  =  r{ri)  cos  0,  y  =:  r(tf)  sin  tf,  2  =  2(77)  (12) 

where,  obviously,  x^  -h  y^  =  r^.  It  is  helpful  to  introduce  the  angle  0  measured  between  the 
positive  r-axis  and  the  tangent  to  the  generating  middle  curve.  The  position  vector  p  and 
unit  normal  vector  71  to  a  point  on  the  middle  surface  becomes  with  (f> 

p  =  (r  cos  ^,r  sin  5, 2)^,  71  =  (sin  cos  sin  ^  sin  0,  —  cosd>)^  (13) 

In  the  same  way  the  position  vector  to  a  material  point  on  77  at  a  distance  (  from  the  middle 
surface  is 

q  =  r  +  (n,  dq  =  dr  +  d(n  -|-  ^dn  (14) 

since  dr  is  on  the  tangent  plane,  and  n^n  =  1,  we  have  that  nTdr  =  0,  n^dn  =  0,  and  an 
arc  element  ds^  =  dq^dq  becomes 


in  which 


ds^  =  dr^dr  +  (^dn^dn  +  d^^  2^dr^dn 


dr  9r  j,  j/i 


Because  the  77  =const.  and  9  =const.  curves  are  orthogonal 


(15) 

(16) 


37 


(17) 


.dT-rr-dn.  « 

=  (5^X5*)  =  '' 

and  we  are  left  with 

dr^dr  =  a^dtf^  +  r^dO^ 

dn^dn  =  <i>'  dt)^  +  sin*  d>dB^ 
dr^dn  =  a<f>  dTf^  +  r  sin 

where  prime  denotes  differentiation  with  respect  to  7/,  and  where 


r' 

sin  ^  =  — ,  cos  <j>  =  — 
a  a 


"  I  t  " 

2  r  ~  2  r 


,  a  =  (/  + 


Finally 


ds*  =  (q  +  (<f>')^dTi^  +  (»“  +  C  sin  4>Ydd^  + 


2jfl2  ,  J/-2 


written  for  the  undeformed  shell  as 

dsl  =  (ao  +  +  (ro  +  C  sin  <Ao)*  +  dC*  (21) 

under  the  simplification  C  =  Co* 

Strain  is  obtained  from  the  ration  of  the  two  quadratic  forms  ds*  and  dsg  as 

,  (a-ao)  +  C(^'-^o)  ,  (r-ro)  +  C(sin<?i-sin(;io) 

^  T/»i> - ’«2(C)  = - -  -  (22) 

ao  +  Q<Pq  +  C  sin  <po 

Integration  of  the  elastic  energy  with  respect  to  (  yields,  after  some  obvious  simplifications 


E  =  7r£(e,i/)e[t(ci  +  2j/ciC2  +  ^i)  +  A  +  2(«i  +  2i/kiK2  +  «2)“o»’od77  (23) 


in  which  i/  is  the  Poisson  ratio,  that  is  left  independent  of  the  strain,  and  where  t  is  the  shell 
thickness.  Also 

a  r 

Cl  = - 1,  €2= - 1 

OA\ 

<f>’  —  <Ao  sin  d  —  sin  '  ' 

Kl  =  - K2  =  - 

ao  tq 

Equations  (11)  and  (23)  form  the  basis  for  the  derivation  of  the  element  data  for  the  shell. 
But  before  getting  on  with  the  matrix  and  vector  derivation  of  the  sell  element  we  need 
briefly  consider  some  conventions  for  differentiation  with  respect  to  a  vector. 
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Let  /(*)  =  /(®i,Z2,...,®n)  be  a  scalar  function  of  the  vector  argument  x.  We  define 
the  differentiation  of  /  with  respect  to  the  vector  x  as 

dx  dxi  dx"^  dxi,dxi 

Notice  that  /'  is  a  vector  and  /  is  a  symmetric  matrix. 

Obviously, 

(/  +  9)'  =  /'  +  9\  icf)‘  =  cf,  ifg)'  =  gf  +  fg',  (26) 

but 

(5/')'  =  ^/"  +  /'/, 

where 

^  ^  dxj  dxj 

is  a  nonsymmetric  matrix.  The  matrix 

ifg)"  =  gf"  +  fg"  +  fg'^  +  g'f'^  (28) 

is  symmetric.  Now  we  have  all  that  is  needed  for  the  discrete  integration  of  the  total  potential 
energy  and  the  formation  of  the  element  data. 

Recall  that  the  parametric  equation  of  the  generating  curve  for  the  shell  is  r  =  r(7;), 
2  —  2(q).  Let  (  measure  arc  length  along  the  generator  so  that  ao  =  1  and  d?/  =  ds.  The 
finite  element  extends  between  si  and  sj  +  h  so  that  we  may  write  s  =  si  +  her,  0  <  <r  <  1, 
and  ds  =  h  d<r.  If  prime  denotes  differentiation  with  respect  to  a  and  dot  differentiation 
with  respect  to  <r  then  (  )'  =  h~^(  )’  and  (  )  =  )". 

To  have  a  cubic- cubic  element  we  choose  the  nodal  values  vector 

=  (»'l,n,Zl,2i,r2,f2,22,^2)^  (29) 

and  interpolate  t  and  z  with 

r  =  Ug<f>,  z  =  uJ’V',  (30) 

where  ^  and  V’  are  the  shape  function  vectors 

<f>  -  (<Aii<A2»O5O>03>04,O,O)^,  ^  =  (0, 0,(^1,  <^2.050, <^3,(^4)^, 

with 

=  e’,  h  =  3^2  -  2^^  <^4  =  (31) 

We  integrate  the  total  potential  energy  by  sampling  it  at  the  three  Gauss  points 

fl  =  ^(5  -  v^).  6  =  6  =  i(5  +  >/l5),  (32) 
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and  weights 


=  ti,,  =  tnj  = 


where  <f>j  and  ij)j  shortly  stand  for  <j>{(j)  and  and 

^1.3  =  ±  uVii,  5  ±  vis,  0, 0, 50  1=  12vl5,  -5  ±  Vl5, 0, 0)^, 

=  ^(4, 1,0, 0,4, -1,0, of, 

^1,3  =  ^(-6,2  ±  vis,  0,0, 6, 2  T  vl5,0,0f , 

H  =  ^(-6, -1,0, 0,6, -1,0,0), 

3  =  i(q:6vl5,  -5  ^  3Vl5, 0, 0,  ^SVlS,  5  :f  3vl5, 0, 0)^, 

5 

^  =  (0,-l,0, 0,0, 1,0,0)^, 

01,3  =  Y^(0, 0, 50  ±  12vl5, 5  ±  vis,  0, 0, 50  =F  12 vls,  -5  ±  vls)^, 

^^2=  ^(0,0, 4,1,0,0, 4,-1)^, 

01.3  =  ^(0, 0,  -6, 2  ±  vis,  0, 0, 6, 2  T  vls)^, 

02  =  i(0,0,-6,-l,-,-,6,-l)^ 

01,3  =  ^(0,0,T6vl5,5q:3Vl5,0,0,±6vl5,5q:3vl5)^, 

0 

02  —  (0>  0?  !)• 

The  upper  sign  of  y/l5  is  for  Gauss  point  1  and  the  lower  for  Gauss  point  3. 

We  derive  the  element  data  from  the  energy  expression  and  write  for  the  eth  element 
1  ’ 

=  '^j^0j[c{€]j  +  2i/e]jC2j  +  ejj)  +  («i>  +  2vKyjK2j  +  k^)].  (35) 

^  2=1 

where  j  refers  to  the  yth  Gauss  point,  and 

From  the  definitions  for  the  element  gradient  and  matrix 

dE;  , 

“  ■S7’  *'  ■  dui  ’  *  ' 

we  have  that 
3 

ge  =  +  «224i]  +  «12«0  +  '^(«i2«2y  +  «2y«iy)  +  «2y«2y} 


and 


fee  =  ^  X]  +  ^iy^2y 

y=i 


+  ^2jAi  +  ^2yCy)]  +  'Slj'tij  +  +  ^2j^2j  "t"  ^2j>^2i 

I  iT  «  "  I  \1 

^(^ij^2j  ^ij^2j  "t"  ^2y*iy  +  '®2y^iy//» 

where  (  )'  =  dfdue. 

To  shorten  the  notation  we  introduce 


so 


that 


and 


Next  we  write 


and 


«;'  =  h-^[2g-^fg'/  +g-^f"  -g-^fg"  -g-Hf'/  +gfh 

«2  =  g^  -  \g'^^'^{g'^^  +  i’/)  -  \g'^^'^'^9  1. 

which  is  all  we  need  to  program  the  element  gradient  and  stiffness  matrix. 
To  account  for  pressure  p  and  point  force  F  we  add  their  potential  ir*, 


(39) 


f  -  zr  =  zr,  g  =  +  z^, 

(40) 

€i  =  -  1,  Ki  =  h~^g~^f 

€2  =  rg^r  -  1>  K2  =  ro^g~^^^z, 

(41) 

f  =  rij}  ^z  —  —  2^,  g'  =  2(f^  +  2^), 

f  "  =  ^  4-  -  ^^5  g  =  4-  V't^^)- 

(42) 

cl  =  4  =  &->(s-V'  - 

4  =  ’'0  V.  4  =  '•0 "  js”’^’V). 

(43) 

4  =  -  5S-’''V/),  4'  =  0. 

2t  M 

(44) 


x-  =  ,-i/rV^-r. 

(45) 

where 

(46) 

and  have  that 

1  ’ 

=  p*7  E  «'j(2’^yij>y  + 

*  y=i 

(47) 
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and  ^ 

fee  =  ^  +  h<i^J)]y  (48) 

'  J=i 

5.  Numerical  computations 

An  extensive  number  of  numerical  tests  were  carried  out  to  test  the  working  of  the 
element  formulation,  material  modeling,  and  tracking  routine.  Some  representative  examples 
will  be  discussed  here.  Apart  from  the  most  obvious  conclusion  that  our  program  worked 
correctly  under  the  most  adverse  numerical  circumstances,  we  also  observe  that  with  the  thin 
shell  of  revolution  plasticity  is  not  as  interesting  as  elasticity.  The  clearest  manifestation  of 
yield  happened  to  be  the  creation  of  a  plastic  hinge  at  a  latitude  of  excessive  bending.  Most 
examples  shown  here  are,  therefore,  of  neairly  pure  elastic  nature. 

The  nonlinear  behavior  of  the  highly  deformed  thin  spherical  cap  is  highly  complex, 
and  its  finite  element  computation  should  be  considered  a  significant  numerical  feat.  In  the 
following  examples  s  denotes  the  step  size  in  the  tracking  predictor,  t  the  shell  thickness  (its 
radius  being  1),  Ne  the  number  of  elements,  and  the  angle  between  the  r  axis  and  the 
tangent  to  the  generator  at  z  =  0  (tfo  =  80*’  means  a  complete  sphere  while  =  180”  means 
a  flat  plate.)  All  discretizations  are  done  with  Ne  =  7. 

Figure  6  shows  the  inversion  of  a  spherical  cap  (t  =  0.002,  do  =  Stt/S)  by  an  apex  force  for 
a  step  size  of  s  =  0.5.  At  first  the  shell  exhibits  considerable  stiffness  observed  by  the  close 
equilibrium  configuration,  but  a  point  is  reached  at  which  the  dent  becomes  nonstable  and 
snaps  through  to  the  inverted  form  through  a  wavy  pattern.  It  is  remarkable  that  orthogonal 
tracking  handled  this  transition  smoothly. 

Figure  7  refers  to  the  same  cap,  except  for  a  different  edge  condition.  We  observe  that 
the  fixed  rim  condition  has  a  considerable  stiffening  effect  on  the  shell  near  the  transition 
point. 

Figure  8  represents  a  thinner,  and  hence  more  flexible,  shell.  Transition  occurs  here 
earlier  and  the  relatively  large  s  =  0.5  caused  the  program  to  jump  between  far  apart 
equilibrium  states. 

A  larger  step  size  causes  earlier  inversion  as  seen  in  Fig.  9.  We  repeat  showing  these 
examples  (Fig.  10)  to  demonstrate  the  robustness  of  the  algorithm  that  takes  a  distant  initial 
guess  for  the  equilibrium  configuration  to  successful  convergence. 

The  interest  of  Fig.  11  lies  in  the  fact  that  the  tracking  algorithm  landed  on  an  equi¬ 
librium  configuration  that  is  mathematically  correct  but  physically  impossible.  There  is  no 
provision  in  the  algorithm  to  tell  it  that  the  shell  may  not  loop.  In  any  event,  the  shell 
became  so  stiff  by  the  loop  that  the  algorithm  could  not  get  out  of  it  and  the  program  was 
aborted. 

To  discern  in  Fig.  12  which  of  the  equilibrium  configurations  is  fictitious  and  which  is  not 
requires  some  discrimination  and  more  numerical  evidence.  What  is  interesting  is  that  the 
program  actually  inverted  the  cap.  Loading  started  at  the  lower  half  and  ends  in  the  upper 
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half  of  the  shell.  Some  of  the  bending  modes  appeair  to  be  accompanied  by  considerable 
stretching  but  in  the  presence  of  such  large  displacements  and  high  strains  it  might  be  well 
possible  that  the  shell  prefers  to  stretch  than  to  bend. 
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Figure  1-b  Constant  load  corrector. 
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Figure  4.  Stress  a  vs.  strain  e  with  distir.ct  yield  point. 


Figure  7.  Inverted  spherical  cap. 


Figure  8.  Inverted  spherical  cap. 


s=I.O  t=  0.002 


Figure  9.  Inverted  spherical  cap. 
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Figure  11.  Inpossible  loop. 


ASPECTS  OF  EDGE  CONSTRAINTS  IN  SHEAR -DEFORMABLE 
PLATE  AND  SHELL  ELEMENTS 


Alexander  TESSLER 
Mechanics  and  Structures  Branch 
U.S.  Army  Materials  Technology  Laboratory 
Watertown,  Massachusetts  02172,  U.S. A. 

ABSTRACT .  The  method  of  explicit  edge  constraints  for  generating 
simple,  consistent  and  efficient  shear- deformable  displacement  bending 
elements  is  discussed.  Particular  attention  is  focused  on  the  deriva¬ 
tion  of  a  highly  desirable  three-node  shallowly  curved  shell  element. 
Shell  theory  and  finite  element  approximation  issues  are  discussed  in 
detail.  Several  numerical  studies  are  carried  out  which  demonstrate  the 
effectiveness  of  the  constraint  methodology. 

I .  INTRODUCTION .  The  search  for  "optimal"  shell  finite  elements 
has  been  underway  for  nearly  two  decades.  In  recent  years  it  has 
further  accelerated  in  light  of  significant  progress  in  the  technology 
of  shear-deformable  C*’  bending  elements  (e.g.,  [1-19]).  Although  the 

main  obstacles  for  these  developments,  known  as  shear  and  membrane 
locking  phenomena,  have  been  addressed  extensively  and  several  remedial 
schemes  have  been  proposed,  a  viable  three-node  doubly  curved 
shear-deformable  element,  which  is  the  most  desirable  element  for 
general  shell  analysis,  has  not  yet  been  developed.  The  purpose  of  this 
effort  is  to  derive  such  an  element. 

We  base  our  finite  element  derivation  upon  Reissner-Mindlin  plate 
theory  which  will  constitute  the  bending  part  of  the  element.  To 
account  for  the  membrane  deformations  and  the  membrane -bending  coupling 
associated  with  the  shell-element  curvatures,  we  shall  resort  to 
Marguerre's  shallow  shell  equations.  Shallow  shell  elements  of  this 
type  specialized  to  the  axisymraetric  response  proved  effective  in 
discretizing  shallow  as  well  as  deep  shell  structures  [12].  The  major 
advantage  of  this  analytic  approach  over  general  shell  formulations 
(e.g.,  [5,18])  is  its  inherent  simplicity.  Herein,  the  displacements 

and  stress  resultants  are  attributed  to  the  element  reference  plane. 
Consequently,  integrations  are  carried  out  across  the  reference  plane 
rather  than  the  curved  surface  as  in  the  general  shell  elements. 

According  to  Reissner-Mindlin  theory  [21-23],  the  strain-displace¬ 
ment  relations  can  be  expressed  as: 
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(1) 


The  Marguerre  membrane  strain-displacement  relations  for  a  thin  shallow 
shell  have  the  form  [24]: 

e  ®  L^u  +  Li(5)I*2W  (5) 

with 

=  {u,v}  (6) 

where  u  and  v  are  the  membrane  displacements  in  the  x  and  y  coordinate 
directions,  respectively;  and  C=C(x,y)  is  the  initial  height  of  the 
shallow  shell. 

One  important  aspect,  which  in  previous  attempts  to  merge  the  two 
theories  has  not  been  addressed  [9-12],  is  the  conceptual  difference  in 
the  transverse  displacement  variables  appearing  in  (2)  and  (5).  In  (2), 
w  is  a  weighted  average  transverse  displacement  across  the  thickness, 
whereas  in  (5)  w  represents  the  midsurface  transverse  displacement.  The 
former  variable  comes  into  play  due  to  the  inclusion  of  shear 
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deformation  in  Reissner-Mindlin  theory;  the  latter  one  is  a  consequence 
of  the  Kirchhoff  thin-regime  assumption,  which  neglects  shear  deforma¬ 
tion.  Utilizing  (2),  the  Kirchhoff  thinness  constraint  reads: 

Ljw  =  -  10  (7) 

Replacing  (7)  into  (5)  yields  the  Marguerre  membrane  strains  consistent 
with  the  Reissner-Mindlin  strains: 


G  —  Lj^u  — 


(8) 


The  stress  resultants,  which  are  attributed  to  the  reference  plane 
of  the  shell,  are  related  to  the  strains  through  the  constitutive  law: 


N  =  {N  ,  N  ,  N  }  =  Ae 

XX  yy  xy 


M  =  {M  ,  M  ,  M  }  =  Dk 

XX  yy  xy 


(9) 


(10) 


Q  =  {Q^,  Qy}^  =  Gy  (11) 

where  A,  D  and  G  are  respectively  the  membrane,  bending  and  transverse 
shear  constitutive  matrices.  The  principle  of  virtual  work  can  then  be 
employed  to  derive  the  finite  element  stiffness  equilibrium  equations: 

jj  (N'^6e  +  -  q6w)  dA  =  0  (12) 

A 

where  q  is  the  distributed  transverse  loading,  A  is  the  reference  plane 
area,  and  6  denotes  the  variational  operator. 

II.  FINITE  ELEMENT  ISSUES.  The  development  of  effective  curved 
shear-deformable  shell  elements  is  severely  hampered  by  the  "locking 
phenomena"  (extreme  stiffening),  reflecting  the  inability  of  the  shell 
to  bend  without  stretching  ("membrane  locking")  and  transverse  shearing 
("shear  locking").  The  two  phenomena  are  directly  link  to  the  penalized 
strain  energy  which,  in  its  nondimensional  form,  can  be  expressed  as: 
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U(ic,if,e)  =  U.  (k)  +  a  U  (y)  +  a  U  (e)  (13) 

b  s  s  ram 

in  which  U,  (ic),  U  (y),  and  U  (e)  denote  the  nondimens ional  bend- 
b  s  ra 

ing,  transverse  shear,  and  membrane  energy  integrals;  and  and  are 

the  nondimens ional  shear  and  membrane  penalty  parameters,  respectively. 

Note  that  a  =0(X^/t^)  and  a  =0( ( PX) ^/t^ ) ,  where  X  and  p  are  respectively 
s  m 

some  characteristic  span  and  slope  of  the  shallow  shell  [11,12].  As  the 
shell  thickness,  t,  diminishes  to  zero,  both  ct^  and  approach  infini¬ 
ty,  thereby  enforcing  the  vanishing  shear  and  membrane  strains: 

L2W  -Id  (Kirchhoff  constraints)  (a) 

(14) 

LjU  ->  Li(5)0  (Membrane  inextensibility  constraints)  (b) 

The  particular  appeal  of  this  theory  is  that  the  variational 
statement  (12)  requires  a  class  of  C®  continuous  approximations  for  the 
w,  u,  and  0  fields  (since  their  highest  spatial  derivative  in  (12)  is  of 
order  one)  and,  therefore,  simple  shape  functions  can  be  used.  On  the 
other  hand,  constraints  (14),  when  imposed  at  the  element  level,  pose 
severe  limitations  on  the  kinematic  freedom  attainable  by  each  element. 

A  consistent  resolution  of  this  deficiency  for  a  successful 
discretization  of  the  theory  is  twofold:  (i)  redefine  the  penalty 
parameters  to  allow  relaxation  of  (14)  at  the  element  level;  (ii) 
implement  appropriate  interpolation  schemes  to  best  accommodate  (14). 
The  two  complementary  approaches  have  shown  to  be  effective  and  produced 
a  series  of  efficient  and  reliable  bending  elements  [6-8,11-14]. 

(i)  REVIEW  OF  PENALTY  RELAXATION  CONCEPT.  The  first  approach  deals  with 
an  introduction  of  a  parametric  device  in  the  variational  statement  for 
the  purpose  of  relaxing  enforcement  of  penalty  constraints  at  the 
element  level. 

Concurrently  with  the  element  displacement  approximations  defined 
h  li  li 

as  w  ,  u  ,  and  0  ,  we  also  approximate  the  constitutive  matrices  A  and 
G,  incorporating  appropriate  "penalty  relaxation"  parameters  for  the 
element : 
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(15) 


=  (J)^  Ae^,  =  (p^  Gy^,  M^=  Dic^ 
m  s 

where  the  element  strains  are 

e^=  Liu^-  Li(5)0^  y^=  L2W^+  10^  tc^=  Li0^  (16) 

and  the  penalty  relaxation  parameters  are  nondimens ional  positive 
quantities  of  the  form; 

(J)?  =  (1  +  C^a^)  ^  (i=  m,s)  (17) 

where  are  positive  element  constants,  and  are  element  analytic 
penalty  parameters  of  the  order  a  =0(h^/t^)  and  a  =0( ( p^) ^/t^ ) ,  where  h 

S  Ed 

and  p  are  respectively  some  characteristic  span  and  slope  of  the 
element.  The  corresponding  principle  of  virtual  work  for  a  single 
element  approximation  takes  the  form: 

[(N^)^6e‘^  +  (M*’)'^6tc^+  (Q^)^5y''  -  q^w^]  dA  =  0  (18) 

A 

where  integration  extends  over  the  element  reference  plane  with  A 
denoting  the  element  reference  area.  The  resulting  element  strain 
energy  appears  in  the  basic  form  of  (13),  except  that  all  quantities  are 
superscribed  with  h  (i.e.,  element  approximations);  however,  the  element 
penalty  parameters  take  a  fundamentally  different  form: 

=  0^/(1  +  a^)  (i=  m,s)  (19) 

These  penalties  relax  enforcement  of  (lA)  as  t-»-0  and  thus  alleviate 
possible  spurious  constraining.  Note,  however,  as  the  kinematic  approx¬ 
imations  improve  with  the  h-ref inement,  approach  their  analytic 
values  a^,  thus  ensuring  convergence  to  the  "true"  solution  both  in  the 
constitutive  and  kinematic  sense  [4,8,12]. 

(ii)  ANISOPARAMETRIC  INTERPOLATION  SCHEME.  The  other  fundamental  means 
for  improving  element  behavior  is  to  devise  appropriate  interpolation 
schemes  which  best  accommodate  the  requirements  of  (14).  Such  interpo- 
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lations,  termed  anisoparametric  [13],  employ  distinctly  different  degree 
polynomials  for  w,  0,  and  u  to  reflect  the  differences  in  the  order  of 
the  differential  operators  L2  and  I  in  (14a)  and,  likewise,  Lj  and 
Li(5)  in  (lAbl.  The  specific  aim  is  to  design  out  the  unwanted  "spuri¬ 
ous"  constraint  equations  arising  from  (14)  [14], 

To  represent  the  bending  part  of  the  shell  element,  we  adopt  the 

3-node  anisoparametric  plate  element  [13-14],  in  which  9  and  0  are 

X  y 

interpolated  linearly,  while  w  is  represented  by  a  complete  quadratic 
polynomial;  throughout  the  formulation,  area-parametric  coordinates 
C=(^i»C2»C3)  are  used  as  a  basis  for  all  interpolations  (refer  to  Fig. 
2): 


8^= 


(I=x,y), 


(20) 


where  and  are  the  row  vectors  of  linear  and  quadratic  shape 

functions,  respectively,  and 


(0^)'^  =  {ej.}. 


/  hxT  ,  h  , 
(w  )  =  (w 


(I=x,y;  j=l,2,3;  k*l,...,6)  (21) 


are  the  vectors  of  nodal  dof. 

Adopting  the  shell  element  of  constant  curvature  (i.e.,  interpolat¬ 
ing  5^(C)  parabolically) ,  constraints  (14b)  necessitate  a  complete 
10-terra  cubic  polynomial  for  the  u  and  v  displacements: 


(3)„h 


h 

V 


(22) 


where  N 


(3) 


is  a  row  vector  of  cubic  shape  functions,  and 


(u^)^  =  (v^)^  =  (k=l,...,10)  (23) 

are  the  vectors  of  nodal  dof. 

Evidently,  the  anisoparametric  interpolations  produce  the  same 
polynomial  representation  for  the  left-  and  right-hand  sides  of  the 
constraint  equations  (14)  —  the  condition  that  is  paramount  to  enhanc¬ 
ing  element  behavior  in  the  vanishing  thickness  regime. 

(a)  Edge  Shear  Constraints.  Although  the  initial  w^  rests  on  six 
w  dof  (i.e.,  three  corner  and  three  mid-edge  dof),  a  kinematically 


74 


consistent  elimination  of  the  mid-edge  dof  is  possible  a  priori  to  the 
element  stiffness  derivation.  To  obtain  a  3-node  pattern,  can  be 
constrained  by  the  one-dimensional  edge  constraints: 


^sz,  3s 
s 


w'^(s),  +  e^(s) 
s  n 


(k) 

=  0  (k=1.2,3) 


(24) 


where  s  denotes  a  coordinate  running  along  the  k^^  edge  of  the  triangu¬ 
lar  element  reference  plane;  and  0^  (s)  is  the  tangential  edge  rotation 

h  h  ^ 

which  is  related  to  6  and  6  y(®)  via  an  orthogonal  transformation. 

From  (24),  there  result  three  decoupled  equations  in  terms  of  the 
mid-edge  w^  dof,  which  give  rise  to  the  constraints: 


h  , ,  h  I  , .  .\h  ,  • . 

w=ww+we  +W0 

c  X  X  y  y 


where  W  are  3x3  transformation  matrices,  and 

q 


(w^)^  =  (w^)^  =  {Wj}  (j=l,2,3) 


(25) 


(26) 


Upon  substituting  (25)  into  (20),  we  obtain  a  3-node  interpolation  for 
the  transverse  displacement. 

(b)  Edge  Membrane  constraints.  In  the  manner  analogous  to  the 

above  dof  reduction  for  w^,  one-dimensional  edge  constraints  can  be 

tl 

devised  to  condense  out  the  intra-edge  u  and  v”  dof.  The  following 
constraint  equations  provide  four  edge-compatible  relations  for  each 
edge: 


(k) 


e  (s) 

h  -h  -h 

u,  -  C  9 

s 

3 

-  s  s  n 

Y  (s) 
sn 

3s^ 

v^  -  9^-  9^ 

1 

—  s  n  n  s  s 

(k=l,2,3;  p=1.2) 

(27) 


h  Vi 

where  u  (s)  and  v  (s)  are  cubic  displacement  fields  along  and  normal  to 
the  k-th  edge,  respectively,  and 
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(28) 


cNc),. 


(q=  s,n;  i:=l,2,3) 


are  the  k-th  edge  slopes. 

By  the  use  of  appropriate  orthogonal  transformations,  (27)  are 
expressed  in  terms  of  the  shell  element  variables  of  interest,  namely, 

u^,  v^,  6^  and  0^  dof,  and  algebraically  solved  for  the  intra-edge 

h  ^  ^ 

and  V  dof: 


=  u  +  u  e’’  +  u  0^ 

c  X  X  y  y 

=  V  +  V  0^  +  V  e*’ 
c  X  X  y  y 

where 


(29) 


,  hsT  ,  h, 
(u  )  =  {u,  }, 

C  1 


(v^)^  =  {vh  (i=4,...,9) 

C  1 


(30) 


and  and  are  6x3  transformation  matrices.  Equations  (29)  are 
substituted  into  the  initial  interpolations  (22)  to  give  the  constrained 
fields  for  the  membrane  displacements  in  terms  of  the  corner-node  dof 
and  two  centroidal  dof.  The  latter  dof  are  condensed-out  statically 
after  the  formation  of  the  element  stiffness  matrix  and  consistent  load 
vector.  Consequently,  a  3-node,  15  dof  element  pattern  is  achieved. 

Note  that  the  edge  constraint  procedures  just  described  preserve 

ll  ll 

the  original  polynomial  order  of  the  constrained  variables  (w  ,  u”  and 
v‘);  moreover,  one  can  show  that  the  constrained  fields  are  fully 
compatible  across  element  edges,  and  they  allow  for  rigid-body  motion 
without  straining.  For  further  details  on  this  procedure  and  for  the 
explicit  form  of  the  shape  functions,  the  interested  reader  is  referred 
to  [13,14,20]. 

The  remainder  of  the  formulation  follows  standard  finite  element 
procedures.  Application  of  the  virtual  work  statement  (18),  while 
performing  exact  integration  throughout,  yields  the  element  stiffness 
equations.  The  issue  of  the  rotational  variable  normal  to  the  reference 
plane,  rie^ded  to  avoid  mathematical  singularities  in  the  global 

coordinates,  produces  three  additional  dof  for  the  element  (e.g.,  see 

[10]). 
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III.  NUMERICAL  EXAMPLES.  An  important  step  in  completing  the 
relaxation  methodology  of  Section  II  is  to  obtain  appropriate  parame¬ 
ters  and  the  values  for  (i=m,s).  Herein,  we  adopt  the  approach 
developed  in  [13],  where  are  defined  as: 

ct^  =  S  k®/  Z  k®  (i=s,m;  b  -  bending)  (31) 

6  6 

in  which  k  .  and  k  .  denote  the  element  diagonal  stiffness  coefficients 
^  h  ^  h 

associated  with  0^'  and  0^  dof  for  the  unrelaxed  case,  i.e.,  d)^.  =1. 

X  y  1 

As  far  as  the  "optimal"  values  for  C  and  C  ,  these  are  determined  from 

numerical  testing.  The  shear  relaxation  constant,  ^^=2,  has  already 

been  established  to  ensure  free  of  locking  plate-element  behavior  [13]; 

C^=l  was  chosen  from  the  numerical  results  of  the  present  study. 

The  present  element  was  critically  tested  on  four  challenging  shell 

problems,  where  two  of  its  versions  were  employed:  (a)  the  element  with 

both  the  shear  and  membrane  relaxations  (C  =2,  C  =1),  labeled  "MIN3sm", 

s  m  ’ 

(b)  the  element  with  the  shear  relaxation  only  (C  =2,  C  =0),  labeled 

s  m 

"MIN3s".  Our  findings  are  summarized  as  follows. 

(i)  Test  of  Rigid-Body  Motion.  A  spectral  analysis  was  performed  on  the 
element  stiffness  matrix  for  the  flat,  singly  curved,  and  doubly  curved 
element  geometry,  to  check  MIN3's  ability  to  move  as  a  rigid  body 
without  incurring  any  straining.  Under  all  conditions  tested,  there 
resulted  six  requisite  zero  eigenvalues  associated  with  rigid  body 
motion. 

(ii)  Clamped  Circular  Arch.  A  simple  test  of  both  membrane 
inextensibility  and  shearless  deformation  is  a  clamped,  thin  circular 
arch  under  a  tip  bending  moment  (Fig. 3).  An  additional  modeling  diffi¬ 
culty  is  that  the  arch  is  rather  narrow,  hence  the  element  aspect  ratios 
are  large.  At  all  discretization  levels,  exact  values  for  the  stress 

resultants  are  obtained  in  each  element  (i.e.,  M  =M,  with  all  forces 

z 

vanishing).  Figure  3  depicts  a  convergence  study  of  the  tip  bending 
rotation,  which  is  also  a  direct  measure  of  the  strain  energy  for  this 
problem.  Note  that  MIN3s  exhibits  considerable  membrane  stiffening. 
Clearly,  MIN3sm  is  a  superior  performer,  yielding  highly  accurate 
results  even  under  coarse  discretizations. 
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(iii)  Pinched  Cylinder  with  Free  Ends.  The  free-ended  cylindrical  shell 
subjected  to  two  radial  forces  180  degrees  apart  (Fig.  A)  is  a  widely 
used  test  problem  to  establish  how  well  a  singly  curved  shell  element 
can  represent  inextensinal  bending.  As  t/R->^0,  pure  inextensional  state 
of  deformation  is  attained  in  the  cylinder. 

Figure  A  shows  convergence  studies  of  the  deflection  under  the  load 
for  the  moderately  thin  (R/t=50)  and  very  thin  (R/t=2000)  cylinders. 
The  present  results  are  compared  with  the  exact  solution  and  those  of 
four  reduced  integration  quadrilateral  elements  (for  details  on  these 
quadrilaterals,  refer  to  [1]).  Both  MIN3s  and  MIN3sm  exhibit 
excellent  behavior,  with  MIN3s  being  somewhat  stiffer  than  MIN3sm. 

(iv)  Pinched  Hemisphere.  A  thin  hemispherical  shell  under 
self -equilibrating  radial  forces  (Fig.  5)  is  in  the  state  of  near 
extensional  bending,  having  large  rigid-body  rotations  in  the  deformed 
configuration.  This  problem  is  a  challenging  test  for  doubly  curved 
elements  (e.g.,  see  [25]). 

The  convergence  study  for  the  deflection  under  the  load  is  depicted 
in  Fig.  5,  where  the  results  of  nine  quadrilateral  elements,  examined  in 
[5],  were  included  for  comparison.  Here  again,  MIN3sm  evolved  among  the 
best  performing  elements,  while  MIN3s  exhibited  some  excessive  membrane 
stiffening. 

In  summary,  we  conclude  that  our  three-node  shallow  shell  element 
(MIN3sm)  is  an  excellent  candidate  for  general  shell  analysis  —  it  is 
theoretically  sound,  has  the  simplest  nodal/dof  pattern,  possesses  six 
rigid-body  modes,  and  is  devoid  of  both  membrane  and  shear  locking. 
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Fig.  1.  Shallow  Shell  Notation. 
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3  Nodal  Patterns 
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Fig.  2.  (a)  Area-Parametric  Coordinates, 

(b)  Initial  and  Constrained  Nodal  Patterns, 

(c)  u"  and  v"  Initial  and  Constrained  Nodal  Patterns 
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Fig.  4.  Pinched  Cylinder  with  Free  Ends. 


83 


NUMBER  OF  NODES  PER  SIDE 


Fig.  5.  Pinched  Hemisphere. 
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ABSTRACT 

Finite  element  methods  are  used  to  approximate  parameters  which 
characterise  fracture  properties  of  solids  containing  stationary  cracks 
under  conditions  of  (a)  elasto-plastic  and  (b)  viscoelastic  deformation. 

1.  FINITE  ELEMENT  METHODS  FOR  PLASTICITY  AMD  VISCOELASTICITY 

1.1  Introduction 

In  this  paper  we  are  concerned  with  the  finite  element  approximation  of. 
parameters  which  characterise  fracture  properties  (a)  for  stationary  cracks 
in  materials  which  exhibit  elasto-plastic  deformation  and  (b)  for  station¬ 
ary  cracks  in  materials  which  exhibit  viscoelastic  deformation.  For  these 
problems  the  first  task  is  to  define  satisfactory  mathematical  models  of 
the  deformation  and  of  a  fracture  parameter,  after  which  the  finite  element 
method  cam  be  applied  so  that  the  deformation  and  the  fracture  parameters 
can  be  approximated.  This  field  is  currently  the  subject  of  intensive 
research,  as  is  evident  from  the  succession  of  conference  proceedings  on 
nonlinear  computational  solid  mechanics  and  fracture  mechanics  which  sure 
appearing,  and  the  work  reported  on  here  is  as  yet  only  at  a  preliminary 
stage. 

Only  planar  problems  are  considered  here  and  the  approach  to  the 
discretisation  of  both  the  elasto-plastic  and  viscoelastic  problems  is  via 
the  stress  equilibrium  equation  of  continuum  mechanics  and  the  constitutive 
relations  relevant  to  each  context.  The  Galerkin  method  is  applied  in  each 
case. 

A  J-type  path  integral  is  employed  for  the  case  of  fracture  in  the 
elasto-plastic  case.  This  is  introduced  and  approximated  in  Section  2, 
after  which  results  of  some  numerical  experiments  for  an  elastic  perfectly 
plastic  problem  are  presented.  The  limitations  of  this  approach  to  non¬ 
linear  fracture  are  discussed.  A  similar  approach,  but  also  involving 
crack  opening  displacement  (COD),  is  taken  in  Section  3  for  viscoelastic 
fracture,  where  the  concept  of  a  failure  zone  is  discussed  and  an  algorithm 
for  the  finite  element  analysis  of  a  fracture  problem  involving  such  a  zone 
at  the  tip  of  a  stationary  crack  is  described.  Again  some  results  are 
presented. 

1.2  Equilibrium  PrtAlem  and  Galerkin  J4>proaci]nation 

We  consider  a  two-dimensional  solid  defined  in  a  region  tl  c  r2  with 
boundary  3£1  h  3Q(-  u  3Drp.  The  displacement  at  any  point  x  =  (x-^,X2)'^  of  £2 
(the  reference  configuration)  is  denoted  by  u  =  (U]^,U2)^,  whilst  the  stress 
and  strain  tensor  components  are  denoted  respectively  by  and  The 
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deformation  of  the  body  under  the  action  of  external  forces  f  =  (fi,f2)'^ 
and  boundary  tractions  g  =  (91/92)*^  satisfies  the  equilibrium  equation 


2  3o..(x) 

(1.1)  I  'i<»>  -O  .  0  .  i  =  1.2  . 

j=l  ^ 

toqether  with  boundary  conditions 

(1.2)  u(x)  =  0  ,  *  € 

2 

(1.3)  o^^(x).n^  =  9j^(x)  /  3Q,j,  ,  i  =  1,2  . 

j=l 

The  finite  element  method  is  applied  to  problem  (1.1) -(1.3)  via  a  weak 
formulation  and  the  Galerkin  technique.  This  is  obtained  by  first  tcikinq 
the  scalar  product  of  (1.1)  with  a  test  vector  function  v  €  V,  where 


V  E  {v  :  V  €  (H^(n))^  ,  =  0  ,  i  =  1,2} 


is  the  space  of  admissible  vectors,  and  then  inteqratinq  by  peurts.  Thus  in 
the  weak  problem  we  seek  u  €  V  such  that 


(1.4) 

■ 

T 

0  .e(v)dx  - 

T 

f'*'.v  dx  - 

S2 

SI 

9  .V  ds 


=  0  , 


y  V  e  V  , 


where  in  (1.4)  the  strain  tensor  components  e^j  are  defined  by 


(1.5) 


e . . (v)  =  - 
13  2 


'3v. 

3v.1 

3 

3x . 

3x. 

>  .  V  A  . 

3  1 


i,j  =  1,2  , 


the  vectors  z  and  o  are  qiven  by  e  =  (ei^,e22'2ei2)^,  ®  =  (°11''^22''’12)'^' 
and  the  displace-ment  u  is  involved  in  (1.4)  through  o  via  an  appropriate 
constitutive  relation. 


For  the  application  of  the  finite  element  method  the  region  SI  is  part¬ 
itioned  into  elements  0  =  U  SI®.  A  finite  dimensional  space  c  v  consist¬ 
ing  of  piecewise  polynomial  functions  defined  over  the  partition  is  set  up 
and  the  Galerkin  problem  approximating  (1.4)  is  that  of  finding  u^  € 
such  that 


(1.6) 


o^.e(Vh)*< 


f  .V,  dx  - 

n 


T 

g  .V, 


3S2. 


,  ds  =  0  y  V.  e  s 
h  n 


where,  in  a  similar  manner  to  (1.4),  the  approximation  to  the  displace¬ 
ment  u  is  involved  in  (1.6)  through  %  via  an  approximation  to  an  appro¬ 
priate  constitutive  relation. 


Each  component  of  the  approximating  vector  Uh(x)  is  defined  in  terms  of 
basis  functions  N]^(x)  for  the  n  nodes  of  SI  so  that,  in  terms  of  point 
evaluations  0]^  of  u^(x)  at  the  nodes  k  =  l,2,...,n. 


(1.7) 


u^(x)  =  N(x)0 


where  H(x)  =  [N^(z)  ,N2(x) ,  —  ,Ny^(x)]  with  N)((x)  =  N}j(x)l2  and  I2  is  the 
2x2  unit  matrix.  If  we  define  the  approximate  strain 

(1.8)  =  e(u^(x))  h  B  O  , 

in  the  usual  way,  see  e.g.  Zienkiewicz  [13],  we  now  require  a  constitutive 
relation  between  o  and  e. 

For  the  case  of  an  isotropic  material  and  linear  elasticity  this  is 

(1.9)  o(u(x))  =  D  e(u(x)) 

where  D  is  the  3*3  matrix  arising  from  Hooke's  law.  The  matrix  D  depends 
on  the  Lame  coefficients  X  and  m  of  the  material.  Using  (1.9)  with  (1.6), 
and  having  taken  in  turn  to  be  each  column  of  H,  we  obtain  the  linear 
equation  system 


(1.10) 


(B  D  B  dx)  U  - 


T 

f  -H  dx 


T 

9 


•  N  ds 


0  , 


which  when  solved  produces  0  and  hence  uy,(x). 


2.  ELASTO-PLASTIC  PROBLEM  AND  NOiLINEAR  FRACTORE 

2.1  Elasto-plastic  Mathematical  Model 

In  order  to  provide  a  mathematical  model  for  the  case  where  the 
material  of  the  solid  exhibits  an  elasto-plastic  response,  we  have  to  set 
up  a  model  of  the  constitutive  relationship  between  stress  and  strain 
appropriate  to  the  nonlinear  post-yield  plastic  case. 

We  adopt  here  the  incremental  (flow  theory)  of  plasticity  and  apply  the 
loading  incrementally.  Thus  in  (1.1)- (1.3)  we  consider  increments  do,  de, 
du,  respectively  of  stress,  strain  and  displacement,  which  result  from 
increments  of  loading  df  and  dg.  The  displacement  u  is  now  a  function  not 
only  of  space  but  also  of  the  current  load.  We  therefore  introduce  a  load 
factor  t  (fraction  of  total  load),  so  that  u  =  u(x,t).  In  the  usual 
manner,  see  e.g.  Owen  and  Hinton  [7],  Harrison,  Ward  and  Whiteman  [2],  the 
level  of  stress  at  which  plastic  deformation  takes  place  is  determined  by  a 
yield  criterion,  based  on  a  yield  function  F, 

(2.1)  F(o,k)  =  f(o)  -  k  S  0  , 

where  f  is  the  equivalent  stress  function  and  k  varies  during  plastic 
deformation  so  that  k  =  f(o)  and  F(o,k)  =  0.  For  any  load  increment,  after 
initial  yielding,  it  is  assumed  that  the  increment  of  strain  can  be  written 
as  the  sum  of  elastic  and  plastic  components  so  that 

(2.2)  de  =  de^  +  dCp  , 

where  deg  is  related  to  do  by  the  D  matrix  of  (1.9).  The  plastic  flow  of 
the  material  is  governed  by  a  flow  rule  which,  for  associative  plasticity. 
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relates  the  increment  of  plastic  strain  to  the  gradient  of  the  yield 
function,  so  that 

3F 

(2.3)  dCp  =  dX  , 

where  dX  j.s  the  plastic  multiplier.  It  is  related  to  the  k  of  (2.2) 
through  a  hardening  rule,  dX  =  Adk.  When  a  state  of  plastic  flow  exists 
stresses  must  remain  on  the  yield  surface  so  that 

and  hence 

(2.4)  a'^do  -  AdX  =  0  , 
where  the  flow  vector  a  is  defined  by 

3F 

®  3a 

We  thus  obtain  from  (2.1)-(2.4)  the  relation 

dE  =  ^  +  ^-jdo  , 

from  which  we  obtain 

(2.5)  do  =  (d  -  ]dE 

A+a  Da 

HD  dc  . 
ep 

where  Dgp  h  Dep(o,k)  is  the  elasto-plastic  constitutive  matrix.  Thus,  for 
the  load  increment  in  the  post  yield  state,  we  have  a  nonlinear 
constitutive  relation. 

If  the  Galerkin  technique  is  applied  to  the  elasto-plastic  problem  in  a 
specific  incremental  load  si-ep,  the  resulting  {nonlinear)  global  equation 
system  corresponding  to  (1.10)  is 

(2.6)  (B"  D  B  dx)  do  -  df^.N  dx  -  dg^.H  ds  =  0  , 

Jn  P  Jtl  J3n^ 

where  h  Dgp  during  yielding  and  Dp  h  D  otherwise.  When  yield  takes 
place  the  system  (2.6)  is  nonlinear  and  is  solved  iteratively  within  the 
load  step;  the  two  most  used  methods  for  this  are  the  initial  stiffness 
and  the  tangent  stiffness  methods.  If  we  define 

(2.7)  K(o,k)  H  B^  D  B  dx  , 

.  n  ^ 
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then  the  iteration  for  solving  (2.6),  with  general  iteration  step  i,  is  as 
follows: 

Step  1  Set  i  =  1,  dU^  =  0,  and  take  and  k^  to  be  their  final  values 
from  the  previous  load  step.  Define 

r'F  ;  df^.H  dx  +  [  dg^.H  ds  . 

Step  2  Set  dOi+j  =  dO^ 

Calculate  o.  ,,  k.  , ,  A.  ,  and  R.  ,  =  R,  -  K(o.  ,,k.  , )dD.  ,  . 

1+1  1+1  1+1  1+1  1  '  1+1  1+1'  1+1 

Step  3  For  some  tolerance  e  if  >  ^|®i|  •“ 

repeat  2.  Otherwise  set  dD  :=  and  stop  the  iteration. 

In  Step  2  the  matrix  H  cam  be  taken  as  R(0,0)  giving  the  initial  stiff¬ 
ness  method,  or  as  the  matrix  K(Oj^,kj^)  from  the  previous  iteration  step 
giving  the  tangent  stiffness  method.  The  values  of  and  k^+i  are  cal¬ 

culated  using  (2.5),  (1.5)  and  the  hardening  rule.  Currently  this  initial 
value  problem  is  integrated  using  the  explicit  forward  Euler  scheme.  The 
value  obtained  is  then  scaled  to  lie  on  the  yield  surface  F(o,k)  =  0. 

2.2  J- Integral  for  Elasto-Plastic  Fracture 

The  path  independent  J- integral  of  Rice  [8]  can  be  used  in  linear 
elastic  fracture  to  obtain  the  stress  intensity  factor.  For  a  Mode  I 
problem  with  a  crack  having  faces  parallel  to  the  x^-axis,  J  is  defined  as, 
see  ( 3] , 


• 

2  3u. 

"  <**2  -  I  ’■i  sr 

(2.3)  J  = 

■ 

r 

i.i  ‘  ' 

where  T  is  a  contour  running  anticlockwise  from  the  lower  to  the  upper 
crack  faces  enclosing  the  crack  tip,  W  is  the  strain  energy  density,  the  T^ 
are  tractions  in  the  outward  normal  direction  to  F  and  ds  is  the  increment 
of  arc  length. 

The  application  of  a  similar  Jp-integral  in  elasto-plastic  fracture  is 
motivated  by  the  work  of  Huchinson  (3],  Rice  and  Rosengren  [9]  on  the  forms 
of  near  tip  HRR  stress  and  strain  fields  in  power- law  hardening  materials 
based  on  the  deformation  theory  of  plasticity.  These  forms  indicate  that 
Jp  plays  in  elasto-plastic  fracture  the  same  role  as  that  of  J  in  linear 
elastic  fracture.  However,  as  the  mathematical  theory  of  plasticity 
developed  above  is  for  incremental  plasticity,  our  use  of  Jp  in  this 
context  needs  some  justification.  The  basis  for  this  is  that  under  mono¬ 
tonic  loading  conditions  incremental  plasticity  and  deformation  plasticity 
produce  similar  results,  so  that  a  secondary  quantity  Jp  derived  from 
either  model  will  also  be  similai.. 

The  Jp  integral  is  calculated  using  (2.0)  and  noting  that  W  now  has 
both  elastic  and  plastic  components  Wg  and  Wp  so  that 

(2.9)  W  =  Wg  +  Wp 


with 


8S 


and 


W 

e 


1  T 
2°  ^ 


e  ' 


w  =  [  f(o)dX  ; 

^  Jo 


note  that  with  the  von  Mises  yield  criterion  (f(o)  =  .o..  - 

12  13  13  2  11  ' 


W  = 
P 


^  a  d£  ,  where  a  and  e  are  respectively  the  effective  stress  and 
1  P  P 


effective  plastic  strain,  see  [6]. 


The  method  of  approximating  (Jp)('^)  in  the  load  step  of  the  deform¬ 
ation,  with  calculated  from  finite  element  approximations  u^(x), 
ayj(x)  and  eyj(x),  derived  as  in  Section  1.2,  is  a  straightforward  discretis¬ 
ation  of  (2.8)  using  calculated  values  at  the  Gauss  quadrature  points  in 
the  numerical  integration  and  the  splitting  (2.9). 

2.3  Node  I  Elasto- Plastic  Fracture  Prcblea 


A  two-dimensional  plane  stress  Mode  I  elasto-plastic  fracture  problem 
with  centre  crack,  see  Fig.  1,  has  been  modelled  using  the  techniques  of 
Sections  1.2  and  1.3,  assuming  a  von  Mises  yield  condition. 


Fig.  1 

This  model  problem  has  been  treated  by  Owen  and  Fawkes  [6].  An  elastic 
perfectly  plastic  material  has  been  assumed  so  that  in  (2.4),  and  sub¬ 
sequently,  A  =  0.  The  width  of  the  region  is  2/5ths  of  the  length  and  the 
crack  length  is  2/5ths  of  the  width,  there  is  a  normal  tensile  load  of  100 
units  on  each  end,  the  Young's  modulus  E  =  10,000,  Poisson's  ratio  v  =  0.3 
and  the  uniaxial  yield  stress  =  100. 

A  basic  finite  element  mesh  based  on  8-node  quadrilateral  elements  is 
put  over  the  quarter  region  as  shown  in  Fig.  2;  this  mesh  is  refined 
locally  around  the  crack  trip  to  investigate  the  effect  that  this  has  on 
near  crack  tip  (Jp)h  values.  Contours  T  surrounding  the  crack  tip  are  then 
defined,  the  top  half  of  some  of  these  is  shown  in  Fig.  3  and  they  pass 
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through  the  2x2  Gauss  points  for  each  element. 

It  is  found  that  the  calculated  values  of  the  integral  are 
the  same,  within  each  load  step,  irrespective  of  whether  the  contours  pass 
through  zones  of  elastic  or  plastic  deformation.  The  only  exceptions  for 
any  attempted  level  of  refinement  are  those  calculated  on  contours  passing 
through  elements  which  have  the  crack  tip  as  a  node. 

We  conjecture  that  this  effect  is  due  to  the  poor  accuracy  of  the 
approximating  in  these  elements  at  the  crack  tip;  it  should  be 
emphasised  that  in  these  elements  no  effort  has  been  made  to  model  the  form 
of  the  singularity.  In  this  near-tip  "failure  zone"  it  is  believed  that 
constitutive  laws  of  the  above  type  break  down,  so  that  further  modelling 
is  necessary  to  represent  the  behaviour  in  this  zone.  We  consider  that 
outside  the  failure  zone  the  calculated  (Jp)h  being  virtually  constant,  are 
good  values  and  thus  can  be  used  in  a  fracture  criterion. 


3.  VISCOELASTIC  FRACTDBE  WITH  A  FAILURE  ZOHB 

3.1  Viscoelastic  Model  and  Finite  Elesient  Discretization 

We  consider  now  viscoelastic  materials  which  have  the  property  that  the 
displacement  u  =  u(x,t)  at  point  x  €  D  and  time  t  depends  on  the  previous 
history  at  that  point;  i.e.  u(x,'r),  t  <  t.  The  weak  problem  at  each  time 
t  relating  to  equilibrium  equations  (1.1)-(1.3)  for  a  general  viscoelastic 
material  is  then 


(3.1) 


o(u{x,-r)  ;T<t)^e(v)dx  - 


I  f(t)'^vdx  -  g(t)^vds  ®  0 

Jn 


y  V  e  V  , 


where  the  test  space  V  remains  as  in  Section  1.2,  i.e.  v  €  v  does  not 
involve  time.  We  limit  discussion  here  to  linear  viscoelastic  materials  in 
which  the  constitutive  equation  has  the  form 


(3.2) 


o(x,t) 


t 

D(t-T)e(x,T)dT  , 

—00 


where  D  is  the  stress  relaxation  matrix.  We  further  restrict  discussion  to 
problems  for  which  there  is  no  deformation  for  time  t  <  0,  i.e.  u(x,t)  =  0, 
T  <  0  which  implies  that  e(x,T)  =  0,  t  <  0  and  thus  the  lower  limit  of  the 
time  integral  in  (3.2)  can  be  replaced  by  0. 

In  discretising  (3.1) -(3. 2)  using  the  Galerkin  technique  at  time  t  we 
approxLmate  u(x,t)  by  Uh(x,t)  where 

(3.3)  Uh(x,t)  =  N(x)0(t)  , 

with  D(t)  denoting  the  vector  of  displacement  nodal  variables  at  time  t. 
Thus  the  approximate  stress  is  given  by 


(3.4) 


o^(x,t) 


■t 

D(t-T)B  a(T)dT  , 

0 


and  the  discrete  form  of  (3.1)  is,  at  time  t,  the  linear  integro- 
differential  equation  system 
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(3.5) 


B*D{t-T)Bdx 


6(T)dT 


f(t)’’Hdx  -  g(t)'^IIds  =  0  . 


We  discretise  (3.5)  in  time  by  taking  time  levels  t^,  j  =  0,1,2,...  and  for 
tj.-^  <  T  <  t  approximating  6(t)  by  (D(tj)-D(tj_i)  )/(tj-tj_i) .  This  gives 

(3.6)  f  b'^d  b  dxl(D(t  )-a(t  ,))  =  f(t)'^H  dx  +  g(t)'^Hds  +  P(t  ) 

'•Jn  ^  ^  Jn  Jan  J 


where 


j-1 


and 


c 

P(t.)  =  I 
q=l 


B' 

n 


D(t .-T)dT/ (t  -t  , ) 

3  q  q-1 


q-l 


B  dx(a(t  )-D(t^  .))  . 

q  q-l 


For  a  general  form  of  D  the  computation  of  P(tj)  involves  the  solution 
at  ail  previous  time  steps.  We  however  restrict  attention  to  materials 
with  constant  Poisson's  ratio  and  matrices  D  which  can  be  expressed,  in 
terms  of  decaying  exponentials,  i.e. 


(3.7)  D(s)  =  (p(s)D(O)  ,  <p(s)  =  J  C^e  ^  0  , 

1 


where  ip  is  the  stress  relaxation  function.  In  this  way,  at  each  time 
level  tj,  P(tj)  can  be  computed  from  the  M  vectors  y(a](,tj_i),  k  =  1(1)M 
where  y{.,.)  is  defined  by 


(3.8) 


Tf(a,tg)  = 


■\  -a(t  -T). 

e  ^  d(T)dT 

0 


This  greatly  simplifies  the  numerical  algorithm.  Also,  because  of  the 
assumption  of  constant  Poisson's  ratio,  the  problem  is  in  a  form  where 
correspondence  principles  can  be  used,  see  e.g.  Schapery  [11]. 

3.2  Viscoelastic  Fracture 

For  a  cracked  viscoelastic  body  we  now  consider  the  choice  of  realistic 
fracture  criteria  for  the  onset  of  crack  propagation  and  the  approximation 
of  such  criteria  using  the  numerical  scheme  outlined  in  Section  3.1.  For 
viscoelastic  materials  this  involves  the  introduction  of  a  Barenblatt  type 
failure  zone  about  the  crack  tip,  see  Barenblatt  [1],  as  was  first 
considered  by  Knauss  [5]  and  Schapery  [lOJ.  This  arises  in  order  to 
attempt  to  describe  the  physics  of  the  process  about  the  crack  tip,  i.e.  to 
model  the  cohesive  forces  and  the  region  of  localised  damage  which  occurs 
about  the  tip,  and  because  criteria  which  are  not  based  on  such  a  failure 
zone  are  found,  in  the  quasi  static  viscoelastic  case,  to  give  crack  growth 
predictions  which  are  greatly  at  variance  with  experimental  observation. 

In  our  model  the  Barenblatt  failure  zone  is  mathematically  a  small 
interval  of  length  af  behind  the  crack  tip  on  which  cohesive  stresses  Of 
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act  in  order  to  ceuicel  the  stress  singularity  produced  at  the  crack  tip  by 
the  external  loads  on  the  body.  We  assume  that  these  cohesive  stresses  are 
constant  on  each  crack  face  as  indicated  in  Fig.  4.  A  physically  motivated 


Fig.  4 

fracture  criterion  that  we  consider  is  then  based  upon  a  critical  value  for 
the  work  input  to  the  failure  zone,  i.e.  when  the  work  done  by  the  cohesive 
stresses  in  this  zone  exceeds  the  critical  value  we  predict  that  the  crack 
will  propagate.  Since  this  work  is  given  by  2LfU2( -af ,0+,t)  for  the 
problem  indicated  in  Fig.  4,  the  criterion  can  equivalently  be  expressed  in 
terms  of  a  critical  crack  opening  displacement  (COD).  In  the  case  of  an 
elastic  material  this  criterion  is  equivalent  to  the  traditional  stress 
intensity  factory  based  criterion,  see  e.g.  Kanninen  and  Popelar  [4,p63], 
but  is  qualitatively  and  quantitatively  a  different  criterion  in  the 
viscoelastic  case. 

There  are  several  numerical  difficulties  arising  from  the  inclusion  of 
the  failure  zone.  Firstly,  because  the  zone  is  small  relative  to  the  rest 
of  the  domain,  local  refinement  of  the  finite  element  mesh  is  required 
eibout  the  tip.  Also,  the  equation  required  to  determine  af  in  order  to 
cancel  the  stress  singularity  is  non-linear  and  thus  an  iterative  scheme 
must  be  developed.  The  algorithm  for  doing  this  will  be  described  fully  in 
Walton,  Warby  and  Whiteman  [12).  We  give  here  only  a  brief  description  of 
the  method  in  the  context  of  a  Mode  I  fracture  problem  in  a  region  and  with 
external  loading  as  in  Fig.  1,  the  load  Le(t)  now  being  time  dependent. 
The  constraint  boundary  conditions  are  not  time  dependent.  As  indicated  in 
the  previous  discussion  we  also  assume  that  loads  of  magnitude  Lf  act 
normal  to  the  crack  faces  in  the  failure  zone  as  in  Fig.  4. 

In  order  to  calculate  af  we  make  use  of  the  linearity  of  the  model  in 
terms  of  displacement  and  consider  two  Mode  I  problems  of  the  form  as 
above,  (i)  with  only  the  external  loading  Lg(t),  (ii)  with  only  the 
failure  zone  loading  Lf.  If  Kg  =  Kg(t)  and  Kf  =  Kf(af(t))  denote 
respectively  the  stress  intensity  factors  of  these  problems  then  an 
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equation  that  we  can  use  to  determine  af  =  af(t)  is  given  by 
(3.9)  Ke(t)  +  Kf(af(t))  =  0  . 


To  determine  and  Kf  we  use  a  correspondence  principle  of  Schapery  [11] 
which  relates  the  solution  of  the  viscoelastic  problem  to  the  solution  of 
reference  elastic  problems  with  the  same  geometry,  loadings  and  boundary 
conditions.  More  specifically  it  proves  that  the  stress  in  the 
viscoelastic  case  is  the  same  as  the  stress  in  the  reference  elastic 
problem  at  t.  Hence,  since  a  J-integral  of  the  form  (1.18)  is  related  to 
the  square  of  the  stress  intensity  factor  in  the  elastic  case,  we  can 
replace  (3.9)  by 


(3.10) 


Jg{t)  -  J®(a^,t)  =  0  , 


R  R 

where  and  J,  denote  respectively  the  J- integrals  due  to  loads  and 
applied  separ^ely.  To  approximate  (3.10)  numerically,  there  ® are  two 
roughly  equivalent  approaches,  either  we  solve  the  reference  elastic 
problem  at  each  time  step  and  calculate  the  viscoelastic  displacement,  when 
required,  by  a  time  convolution  with  the  creep  function,  or  we  solve  the 
viscoelastic  problem  as  in  Section  3.1  and  calculate  the  reference  elastic 
displacement  required  in  (3.10)  by  a  time  convolution  with  the  relaxation 
function.  We  adopt  the  latter  approach  here.  Our  numerical  algorithm  then 
involves  the  following:  For  t=tj,  j=l,2,...  solve  (3.10)  for 

af  =  af(tj)  by  Newton's  method,  where  each  step  of  Newton's  method  involves 
the  solution  of  the  finite  element  equations.  Then  evaluate  the  crack 
opening  displacement  (COD)  given  by  U2(-af ,0+,tj)  and  compare  with  the 
critical  value.  If  the  COD  exceeds  the  critical  value  then  determine  af 
and  t(-j.,  tj_i  <  t(-j.  <  tj  by  Newton's  method  so  that  (3.10)  and 


[  (3.11)  U2( -af ,0+,t<-j.)  =  critical  COD 

are  satisfied.  Our  value  of  t^r  is  then  our  prediction  for  the  time  at 
which  the  crack  will  propagate. 

The  above  problem  is  solved  with  a  normalised  relaxation  function 
ip(t)  =  ( l+-9e~'^)/10  which  corresponds  to  the  normalised  creep  function 
ip(t)  =  10  -  9e0-lt.  We  now  non-dimensionalise  the  Lame  parameters  Ag  and 
Mo  relating  to  D(0)  of  (3.7)  by  setting  mq  =  3.  and  taking  Poisson's  ratio 
V  =  0.49  so  that  Young's  modulus  E  =  2.98  and  Aq  =  vE/(  1+v) ( l-2v)  =  49. 
The  failure  load  Lf  =  10“^  and  the  external  load  is  applied  at  the  three 
different  rates 


Le(t)  =  10"3t,  10'2t  and  lO'^t  . 

We  assume  that  D  is  a  square  of  size  2  and  consider  a  selection  of  crack 
lengths  between  0.25  and  1.50.  A  typical  mesh  consisting  of  8  noded 
quadrilaterals  is  shown  in  Fig.  5.  The  values  of  af,  t^r  and  Le(t(,j.)  are 
given  in  Table  1.  These  show  the  dependence  on  loading  rate.  If  a  stress 
intensity  factor  criterion  is  instead  used  then  this  would  give  a  crack 
propagation  criterion  independent  of  loading  rate,  since  depends  only  on 
Lg,  and  thus  would  ignore  creep  effects.  Hence  it  is  clear  that  there  can 
be  considerable  differences  between  J-based  and  COD-based  fracture 
criteria. 
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a 

af 

Le{tcr) 

0.138 

6.245{-l) 

6.245(-2) 

0.25 

0.052 

3.442 

3.442(-2) 

0.022 

2.044(1) 

2.044(-2) 

0.129 

3.269(-l) 

3.269(-2) 

0.5 

0.065 

2.131 

2.131(-2) 

0.024 

1.216(1) 

1.216(-2) 

0.127 

1.309(-1) 

1.309(-2) 

1.0 

0.086 

9.962(-l) 

9. 962 (-3) 

0.035 

5.860 

5.860(-3) 

0.120 

4.293(-2) 

4.293(-3) 

0.101 

3.758(-l) 

3.758(-3) 

■ 

0.055 

2.408 

2.408(-3) 
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Future  Work 


The  next  phase  of  this  work  is  to  simulate,  using  finite  elanents,  the 
problem  of  a  moving  crack  in  a  viscoelastic  body.  The  aim  of  such  work 
being  to  determine  the  conditions  under  which  the  crack  will  propagate 
stably  or  unstably.  Details  of  this  work  are  to  be  contained  in  [12]. 
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ABSTRACT 

The  interaction  of  a  planar  shock  wave  with  a  small  amplitude 
fluid  interface  is  characterized  by  the  production  of  diffracted  wave 
patterns  that  correspond  to  Galilean  transforms  of  slowly  varying  per¬ 
turbations  of  stationary  waves.  These  waves  are  described  by  solutions 
to  Riemann  problems  for  the  steady  state  Euler  equations.  When  the 
amplitude  of  the  interface  is  not  small  or  the  geometry  of  the  two 
waves  is  changing,  bifurcations  in  the  solution  occur.  This  article  will 
analyze  such  a  bifurcation  for  a  shock  wave  in  a  dense  material  dif¬ 
fracting  into  a  lighter  material.  The  small  amplitude  case  produces 
reflected  Prandtl-Meyer  rarefaction  waves,  and  the  bifurcations  that 
occur  at  larger  amplitudes  can  be  interpreted  as  a  two  dimensional 
analogue  of  a  rarefaction  wave  overtaking  a  shock.  This  analysis  is 
incorporated  into  a  front  tracking  code  and  provides  a  high  quality 
description  of  the  interacting  waves. 


1.  Introduction 

The  interaction  of  a  shock  wave  with  a  fluid  interface  can  be  subdivided  into 
three  regimes.  These  are  a  period  of  collision,  a  small  amplitude  linear  growth 
regime,  and  the  long  time  non-linear  growth  of  instabilities  in  the  interface. 

The  collision  stage  is  characterized  by  the  production  of  complicated  diffracted 
wave  patterns  [Jahn,  1956;  Abd-El-Fattah  et.  al.,  1976;  Abd-El-Fattah  &  Henderson, 
1978a;  Abd-El-Fattah  &  Henderson,  1978b;  Grove,  1989],  and  is  extremely  non¬ 
linear.  In  what  is  called  the  case  of  regular  diffraction,  these  waves  are  perturbations 
of  Galilean  transforms  of  solutions  to  a  Riemann  problem  for  a  steady  flow  with 
supersonic  data.  An  asymptotic  analysis  [Grove,  1989]  of  the  wave  curves  for  the 
stationary  Euler  equations  shows  that  the  regular  case  occurs  provided  the  angle 
between  the  two  incoming  waves  is  small.  This  observation  means  that  the  irregular 
wave  patterns  observed  for  larger  amplitude  interactions  can  be  studied  as  two 
dimensional  bifurcations  of  regular  solutions. 

1.  Supported  in  part  by  the  Army  Research  OfGce,  grant  DAAG29-85-0188. 
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The  second  period  of  the  interaction  [Richtmyer,  1960]  corresponds  to  the 
growth  of  unstable  modes  in  an  impulsively  loaded  material,  interface  that  is  a  small 
amplitude  perturbation  of  a  planar  interface.  During  this  regime  the  flow  is  an 
approximate  solution  for  the  linearized  Euler  equations  and  the  interface  instabilities 
grow  at  an  exponential  rate. 

The  third  stage  concerns  the  long  term  growth  of  the  unstable  interface  [Youngs, 
1984;  Mikaelian,  1986;  Grove,  1989].  The  flow  here  is  characterized  by  the  competi¬ 
tion  and  mixing  of  unstable  modes  and  leads  to  an  eventual  chaotic  behavior  of  the 
material  surface. 

The  main  thrust  of  this  article  is  to  study  the  bifurcation  behavior  of  a  regular 
shock-contact  diffraction  pattern  where  the  reflected  wave  is  a  Prandtl-Meyer  rarefac¬ 
tion.  Such  waves  occur  in  shock-fluid  interface  interactions  where  the  shock  is 
incident  in  the  denser  fluid.  The  bifurcation  occurs  when  changes  in  the  incident 
shock  strength  or  in  the  geometries  of  the  interacting  curves  cause  the  state  behind 
the  incident  shock  to  become  subsonic.  When  this  happens  the  reflected  Prandtl- 
Meyer  wave  overtakes  the  incident  shock  from  behind.  The  overtaking  of  the 
incident  shock  by  the  Prandtl-Meyer  wave  both  dampens  and  bends  the  incident 
shock,  with  corresponding  effects  in  the  transmitted  shock  wave.  Two  classes  of 
examples  of  such  interactions  will  be  described;  a  planar  shock  wave  in  water  dif¬ 
fracting  through  an  air  bubble,  and  a  cylindrically  expanding  shock  wave  diffracting 
through  a  planar  air- water  interface.  Previous  work  [Grove,  1989],  considered  the 
modeling  using  the  method  of  front  tracking  of  regular  diffraction  cases,  and  a  major 
goal  of  this  present  work  is  to  be  able  to  track  these  waves  beyond  the  "regular" 
regime.  A  first  order  analysis  of  this  wave  bifurcation  is  incorporated  into  the  front 
tracking  algorithm  and  gives  an  accurate  description  of  the  colliding  waves 
throughout  their  interaction. 

2.  The  Supersonic  Steady  State  Riemann  Problem 

The  equations  of  motion  for  an  in  viscid,  non-heat  conducting  fluid  are  given  by 
the  well  known  Euler  equations: 

a,p  +  V-(pq)  =  0,  (2.1.1) 

a,(pq)  +  V(p(i0q)  +  VP  =  pg,  (2.1.2) 

a;(p5)+v-((p5  +  P)q)  =  pqg.  (2.1.3) 

Here  g  is  the  particle  velocity,  p  is  the  mass  density,  P  is  the  thermodynamic  pres¬ 
sure,  g  is  the  constant  gravitational  acceleration  vector,  S  =  +  £  is  the  specific 

total  energy,  and  £  is  the  specific  internal  energy.  These  equations  express  the  con¬ 
servation  of  mass,  momentum,  and  energy  respectively.  Assuming  non-reactive 
equilibrium  thermodynamics,  this  system  is  closed  by  a  thermodynamic  equation  of 
state 


£  =  £(V,  S) 


(2.2) 


TOO 


where  E{y,S)  is  a  convex  function  of  the  specific  volume  V  =  1/p  and  the  specific 
entropy  S.  E(V,  5)  satisfies  the  first  law  of  thermodynamics: 

TdS  =  dE  +  PdV,  (2.3) 

where  T  is  the  absolute  temperature.  Equation  (2.3)  implies  that 


In  practice  5  is  eliminated  from  equations  (2.2)  and  (2.4),  and  thermodynamics  of 
the  fluid  is  described  by  an  incomplete  equation  of  state 

P  =  P{E,  V).  (2.5) 

The  key  to  modeling  shock-wave  fluid  interface  interactions  using  a  front  track¬ 
ing  method  is  the  notion  of  a  two  dimensional  elementary  wave  [Glimm  et.  al., 
1985].  Elementary  waves  describe  the  downstream  scattering  of  a  pair  of  interacting 
waves,  and  are  calculated  by  the  solution  of  a  Riemann  problem  for  the  steady  state 
Euler  equations,  where  the  stream  direction  serves  as  a  time  like  axis.  Briefly,  for  a 
shock-contact  interaction,  the  state  behind  the  incoming  shock  wave  and  the  state  on 
the  side  of  the  material  interface  opposite  to  the  incoming  shock  serve  as  data  for  a 
downstream  directed  Riemann  problem.  Since  the  actual  collision  of  the  two  waves 
occurs  over  a  short  interval  in  time,  gravity  can  be  neglected  in  the  analysis. 

Restricting  (2.1)  to  time  independent  planar  flow  with  g  =  0,  we  obtain  a  sys¬ 
tem  of  four  conservation  laws.  This  system  is  hyperbolic  for  supersonic  flow.  For 
most  single  phase  flows  [Thompson,  1971],  and  certainly  for  the  simple  analytic 
equations  of  state  used  in  the  numerical  simulations  below,  this  system  has  two 
genuinely  non-linear  eigenvalues  that  correspond  to  the  propagation  of  sound  waves, 
and  two  linearly  degenerate  modes  that  travel  with  particle  velocity. 

Both  the  pressure  and  the  polar  velocity  angle  are  partial  Riemann  invariants  for 
the  linearly  degenerate  fields,  so  the  Riemann  problem  for  this  system  can  be  solved 
by  finding  the  intersection  of  the  wave  curves  for  the  two  genuinely  non-linear  fields 
in  the  pressure-flow  angle  phase  plane.  This  is  analogous  to  the  solution  to  the 
Riemann  problem  for  time  dependent  one  dimensional  flow  [Menikoff,  1988],  with 
some  important  differences.  The  Hugoniot  locus  portion  of  these  wave  curves  are 
the  well  known  shock  polars  [Courant  &  Friedridis,  1948],  and  extend  into  the  sub¬ 
sonic  region  where  the  flow  ceases  to  be  hyperbolic.  The  shock  polars  also  form 
closed  and  bounded  loops.  These  two  facts  lead  to  a  loss  of  existence  or  uniqueness 
for  the  solution  to  the  general  steady  state  Riemann  problem  with  supersonic  data. 
The  non-uniqueness  is  addressed  by  choosing  the  supersonic  solution  whenever  it 
exists  [Henderson,  1966].  It  can  be  shown  that  under  suitable  conditions  on  the  equa¬ 
tion  of  state,  at  most  one  such  solution  is  possible.  Also  a  linear  stability  analysis 
[Henderson  &  Atkinson,  1976]  shows  that  when  both  a  supersonic  and  subsonic  solu¬ 
tion  exist,  only  the  supersonic  solution  is  stable.  The  non-existence  of  the  solution 
must  be  resolved  by  allowing  it  to  become  time  dependent.  This  leads  to  the 
development  of  many  irregular  diffraction  patterns  [Jahn,  1956;  Abd-El-Fattah  & 


Henderson,  1978a;  Abd-El-Fattah  &  Henderson,  1978b].  The  next  section  will 
describe  one  such  irregular  wave  whose  structure  corresponds  to  the  overtaking  of  an 
oblique  shock  wave  by  a  Prandtl-Meyer  wave. 


3.  The  Anomalous  Reflection  Wave 

Figure  3.1  illustrates  the  collision  of  a  planar  shock  in  water  with  an  air  bubble. 
When  the  shock  first  reaches  the  bubble,  the  two  waves  are  tangent  and  regular  dif¬ 
fraction  patterns  (called  diffraction  nodes)  are  produced  at  the  points  of  collision 
between  the  two  waves.  To  leading  order,  the  flow  near  a  point  of  diffraction  is 
described  by  the  solution  to  a  supersonic  steady  state  Riemann  problem  as  mentioned 
above.  Here,  the  interaction  produces  a  reflected  Prandtl-Meyer  wave.  Figure  3.2 
shows  the  set  of  wave  curves  used  for  the  solution  shown  in  figure  3.2.b.  It  is 
important  to  note  that  the  existence  of  such  a  solution  depends  on  the  flow  being 
supersonic  behind  the  incident  shock  wave  in  a  reference  frame  where  the  node  is  at 
rest.  Because  of  the  large  difference  in  the  compressibility  of  the  two  fluids  (for  this 
model  at  constant  temperature  air  is  about  15,000  times  as  compressible  as  water)  as 
long  as  the  flow  behind  the  incident  shock  remains  supersonic,  the  waves  produced 
by  this  interaction  are  prevented  from  interacting  with  the  incident  waves  and  the 
flow  downstream  from  this  point  remains  neaHy  self  similar  in  a  neighborhood  of  the 
node. 


If  p  denotes  the  instantaneous  angle  between  the  incident  shock  and  the  bubble, 
then  the  Mach  number  behind  the  incident  shock  is  given  by 

2 

II  +  ^cot2p 


HA  -  ^ 


(3.1) 


where  the  subscripts  0  and  1  refer  to  the  states  ahead  and  behind  the  incident  shock 
respectively,  and  mass  flux  across  the  shock.  For  most  equa¬ 

tions  of  state  [Menikoff  &  Plohr,  1989],  m  <  pici,  so  the  flow  behind  the  shock  will 
be  subsonic  if  ^  is  sufficiently  close  to  Thus  a  transition  into  an  irregular  wave 
pattern  occurs  before  the  shock  reaches  the  equator  of  the  bubble. 


When  the  flow  behind  the  incident  shodt  becomes  subsonic,  the  leading  edge  of 
the  reflected  Prandtl-Meyer  wave  begins  to  overtake  the  incident  shock  wave  from 
behind.  This  process  will  dampen  the  inddent  shock  producing  additional  curvature 
in  the  inddent  shock  wave.  These  effeds  will  be  transmitted  into  the  outgoing  waves 
as  well.  The  portion  of  the  reflected  rarefaction  that  overtakes  the  shock  in  a  given 
period  of  time  is  the  amount  that  is  enough  to  restore  the  flow  immediately  behind 
the  point  of  collision  of  the  two  waves  to  a  sonic  flow.  Since  the  sound  speed  in 
water  is  much  higher  than  the  sound  speed  in  air,  the  flow  in  the  air  inside  the  bubble 
remains  supersonic,  and  the  transmitted  wave  continues  to  propagate  downstream 
from  the  node.  This  prevents  the  formation  of  precursor  type  waves  such  as  those 
described  in  (Abd-El-Fattah  &  Henderson,  1978b]. 
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As  the  interaction  proceeds  and  the  bubble  interface  continues  to  diverge  away 
from  the  incident  shock,  more  and  more  of  the  rarefaction  fan  spreads  out  onto  the 
incident  shock  wave,  leading  to  the  formation  of  a  structure  that  is  analogous  to  a 
Mach  reflection  with  a  non-centered  reflected  Prandtl-Meyer  wave,  see  figure  3.3. 
Eventually,  enough  of  the  rarefaction  overtakes  the  incident  shock  so  that  the  flow 
near  the  trailing  edge  of  the  Prandtl-Meyer  wave  becomes  nearly  sonic.  When  this 
happens,  the  trailing  edge  of  the  rarefaction  wave  is  almost  parallel  to  the  incident 
shock  and  to  leading  order  the  flow  near  the  point  of  shock  diffraction  is  a  one 
dimensional  unsteady  flow  with  a  rarefaction  wave  overtaking  a  shock  wave  from 
behind.  Thus  in  finite  time,  the  entire  reflected  rarefaction  wave  overtakes  the 
incident  shock.  The  Mach  node  corresponds  to  the  spread  out  wave  where  the  non- 
centered  rarefaction  meets  the  incident  shock,  and  the  Mach  stem  corresponds  to  the 
portion  of  the  incident  shock  from  the  trailing  edge  of  the  rarefaction  wave  to  the 
fluid  interface.  For  weak  incident  shocks  the  reflected  rarefaction  is  of  about  the 
same  strength  as  the  incident  wave,  and  this  "anomalous  reflection  stem"  is  a  sound 
wave. 

There  is  experimental  evidence  of  these  anomalous  waves.  In  particular  Jahn 
[Jahn,  1956]  figure  14g  shows  such  a  wave  for  the  oblique  diffraction  of  a  planar 
shock  through  a  thin  membrane  separating  two  gases. 

The  structure  of  this  anomalous  wave  will  be  described  in  more  detail  in  a  com¬ 
ing  paper  [Grove  &  Menikoff,  1988]. 

4.  The  Tracking  of  the  Anomalous  Reflection  Wave 

The  qualitative  discussion  of  the  anomalous  reflection  in  the  previous  section  can 
be  incorporated  into  a  front  tracking  code  to  give  an  enhanced  resolution  of  the 
interaction. 

Previous  work  [Grove,  1989]  described  the  tracking  of  a  regular  shock -contact 
diffraction  node.  When  a  shock-contact  diffraction  node  is  identified  at  a  given  time 
step  with  time  increment  dt,  the  pair  of  incoming  interacting  waves  (the  incident 
shock  and  material  interface)  are  first  propagated  independently  ignoring  their 
interaction.  The  intersection  between  the  two  propagated  waves  is  found  and  gives 
the  time  updated  position  po  of  the  diffraction  node.  The  displacement  of  the  node 
position  divided  by  dt  provides  the  node  velocity  and  the  Galilean  transformation  for 
the  flow  near  the  node  into  a  frame  where  the  node  is  stationary.  If  the  state  behind 
the  incident  shock  is  supersonic,  it  together  with  the  state  on  the  opposite  side  of  the 
material  interface  provide  data  for  a  supersonic  steady  state  Riemann  problem, 
whose  solution  determines  the  outgoing  waves.  The  outgoing  tracked  waves  are  then 
modified  to  incorporate  this  solution. 

A  bifurcation  to  an  anomalous  reflection  is  detected  when  the  state  behind  the 
incident  shock  is  subsonic  in  the  frame  of  the  node  and  the  reflected  wave  from  the 
previous  time  step  is  a  Prandtl-Meyer  wave.  The  first  step  in  modeling  this  bifurca¬ 
tion  is  to  propagate  the  leading  edge  of  the  reflected  wave  onto  the  incident  shock. 
This  is  done  as  before,  by  finding  the  point  of  intersection  p\  between  the  two 
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propagated  curves.  If  (as  is  often  the  case)  the  reflected  wave  is  untracked,  it  is 
recovered  by  calculating  the  characteristic  through  the  old  node  position  correspond¬ 
ing  to  the  state  behind  the  incident  shock.  This  characteristic  makes  the  Mach  angle 
A I  =  arcsinl/A/i  with  the  stream  line  through  the  node.  It  is  assumed  that  the  bifur¬ 
cation  occurs  during  the  time  step  so  the  Mach  number  Mi  at  the  beginning  of  the 
time  step  is  greater  than  or  equal  to  one,  and  A  is  real.  The  leading  edge  of  the 
reflected  rarefaction  moves  with  sound  speed  in  the  direction  normal  to  the  charac¬ 
teristic.  If  the  leading  edge  of  the  Prandtl-Meyer  wave  is  tracked,  it  is  disconnected 
from  the  original  diffraction  node  and  a  new  node  (called  a  cross  node  [Glimm  et. 
al.,  1985])  corresponding  to  the  oblique  overtaking  of  a  characteristic  (zero  strength 
shock  wave)  with  a  shock  wave  of  the  same  family  is  installed  at  pi. 

The  next  step  is  to  determine  the  states  and  position  of  the  point  of  shock  dif¬ 
fraction  after  the  bifurcation.  As  the  rarefaction  expands  onto  the  incident  wave,  the 
incident  shock  near  the  material  interface  is  weakened  and  curves  into  the  contact. 
The  interaction  slows  down  the  node  until  the  flow  behind  the  incident  shock  at  the 
node  is  sonic.  Thus  it  suffices  to  compute  a  corrected  node  velocity  or  equivalently  a 
correaed  propagated  node  position  that  takes  into  account  the  shock-rarefaction 
interaction.  Once  this  corrected  position  is  determined,  the  flow  downstream  from 
the  node  is  computed  as  for  a  regular  diffraction. 

For  each  number  s,  let  p(s)  be  the  point  on  the  propagated  material  interface 
that  is  located  a  distance  s  from  po  when  measured  along  the  curve,  the  positive 
direction  being  oriented  away  from  the  node  into  the  region  ahead  of  the  incident 
shock.  Let  p(r)  be  the  angle  between  the  tangent  vector  to  the  material  interface  at 
pis)  and  the  directed  line  segment  between  the  points  p{s)  and  p\.  See  figure  4.1. 
Let  Vis)  be  the  node  velocity  found  by  moving  the  diffraction  node  to  position  pis), 
and  let  ^(r)  be  the  velocity  of  the  flow  ahead  of  the  incident  shock  in  the  frame  that 
moves  with  velocity  Vis)  with  respect  to  the  computational  lab  frame.  The  mass  flux 
across  the  incident  shock  that  makes  an  angle  ^(.r)  with  the  upstream  material  inter¬ 
face  is  given  by 

mis)  =  po(<J(j)(sinp(s).  (4.1) 

Given  mis)  and  the  state  ahead  of  the  incident  shock,  the  state  behind  the  shock  and 
hence  its  Mach  number  Mis)  can  be  found.  The  new  node  position  is  given  by  pis*), 
where 

Mis*)  «  1.  (4.2) 

Finally,  the  state  behind  the  incident  shodc  with  mass  flux  mis*)  together  with  the 
state  on  the  opposite  side  of  the  contact  are  used  as  data  for  a  steady  state  Riemann 
problem  whose  solution  supplies  the  states  and  angles  of  the  transmitted  shock,  the 
trailing  edge  of  the  reflected  rarefaction,  and  the  downstream  material  interface. 

The  subsequent  propagation  of  the  anomalous  reflection  node  is  performed  in 
the  same  way.  The  bifurcation  simply  repeats  itself  as  more  of  the  reflected  rarefac¬ 
tion  propagates  up  the  incident  shock.  The  leading  edge  of  the  reflected  rarefaaion 
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wave  that  connects  to  the  diffraction  node  is  not  tracked  after  the  first  bifurcation. 

The  secondary  bifurcations  that  occur  when  the  trailing  edge  of  the  rarefaction 
overtakes  the  incident  shock  are  detected  in  a  couple  of  ways.  If  the  incident  shock  is 
sufficiently  weak,  ie  the  normal  shock  Mach  number  is  close  to  1,  then  it  is  possible 
for  the  numerically  calculated  upstream  Mach  number  to  be  less  than  one.  Physically 
of  course  the  state  ahead  of  the  incident  shock  is  always  supersonic,  but  if  it  is  nearly 
sonic,  such  numerical  undershoot  may  occur.  When  this  happens,  the  construction 
described  above  must  be  modified.  The  tracked  trailing  rarefaction  edge  is  disen¬ 
gaged  from  the  diffraction  node  and  installed  in  a  new  overtake  node  found  by  inter¬ 
secting  the  propagated  characteristic  with  the  ahead  shock.  The  residual  shock 
strength  for  the  portion  of  the  incident  shock  behind  the  rarefaction  wave  is  small. 
The  diffraction  node  at  the  material  interface  reduces  to  the  degenerate  case  of  a 
sonic  signal  diffracting  through  a  material  interface,  and  the  induced  downstream 
waves  are  also  sound  waves.  The  second  way  in  which  the  secondary  bifurcation  is 
detected  occurs  when  the  trailing  edge  of  the  rarefaction  overtakes  the  shock.  Here  a 
new  intersection  between  the  incident  shock  and  the  trailing  edge  characteristic  is  pro¬ 
duced.  Again  the  tracked  characteristic  is  disengaged  from  the  diffraction  node  and  a 
new  overtake  node  is  installed  at  the  point  of  intersection.  Here,  the  residual  shock 
strength  behind  the  rarefaction  is  positive.  The  diffraction  at  the  material  interface  is 
non-trivial  and  will  produce  an  additional  expansion  wave  behind  the  original  one. 
Most  often  this  new  expansion  wave  is  not  tracked. 

Some  remarks  about  the  amount  to  tracking  of  these  diffraction  nodes  seems  to 
be  pertinent  at  this  point.  The  secondary  bifurcations  described  in  the  previous  para¬ 
graph  need  only  be  explicitly  dealt  with  when  the  edges  of  the  reflected  Prandtl- 
Meyer  wave  are  tracked.  The  current  algorithm  assumes  that  at  a  minimum  the  two 
interacting  incoming  waves  are  tracked.  At  this  extreme  none  of  the  outgoing  waves 
are  tracked  but  are  captured  by  the  interior  solver  that  is  coupled  to  the  front  tracking 
method.  In  such  a  case  the  bifurcations  occur  automatically  and  the  algorithm  is 
much  simpler.  More  commonly,  the  material  interface  separates  different  fluids  so 
that  jthe  change  in  equation  of  state  across  this  interface  requires  the  tracking  of  the 
downstream  portion  of  the  material  interface  as  well.  Furthermore  any  capturing 
method  will  spread  the  captured  wave  over  several  grid  zones  thus  reducing  the  reso¬ 
lution  of  the  two  dimensional  wave.  This  spreading  will  be  particularly  pronounced 
at  expansive  waves  such  as  Prandtl-Meyer  waves  [Glimm  et.  al.,  1987].  Also,  unless 
the  capturing  algorithm  is  carefully  designed,  instabilities  in  the  finite  difference 
approximation  can  destroy  the  accuracy  of  the  solution  near  the  node.  This  is  espe¬ 
cially  the  case  for  stiff  materials  such  as  water.  Tracking  these  waves  seems  to  con¬ 
siderably  reduce  these  problems.  It  also  allows  the  use  of  much  coarser  grid,  which 
is  important  when  the  diffraction  occurs  in  a  small  but  important  zone  of  a  larger 
simulation  and  the  entire  region  of  diffraaion  extends  over  only  a  fraction  of  a  grid 
block.  The  point  of  these  remarks  is  that  the  amount  of  tracking  is  problem  depen¬ 
dent,  and  a  compromise  can  be  made  between  the  increased  accuracy  and  stability  of 
front  tracking,  and  the  simplicity  of  a  capturing  algorithm. 
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5.  Numerical  Examples 

Figure  5.1  shows  a  series  of  frames  documenting  the  collision  of  a  10  Kbar 
shock  wave  with  a  bubble  of  air  in  water.  The  states  ahead  of  the  incident  shock  are 
at  one  atmosphere  pressure  and  standard  temperature.  Under  these  conditions,  water 
is  about  1000  times  as  dense  as  air.  During  the  initial  stage  of  the  interaction  regular 
diffraction  patterns  are  produced.  By  real  time  4.5  p.sec  an  anomalous  reflection  has 
formed,  and  by  10  p-sec  the  trailing  edge  of  the  rarefaction  has  also  overtaken  the 
incident  shock.  It  is  interesting  to  note  that  this  interaction  causes  the  bubble  to  col¬ 
lapse  into  itself.  Longer  simulations  (not  available  at  the  time  of  this  writing)  show 
the  bubble  splitting  in  two  (in  three  dimensions  it  forms  a  torus)  with  the  resulting 
production  of  vorticity.  Very  long  time  simulations  are  expected  to  show  the  bubbles 
going  into  oscillation  as  they  are  overcompressed  and  then  expand.  This  over 
compression  and  expansion  is  important  in  the  transfer  of  energy  as  a  shock  passes 
through  a  bubbly  fluid.  The  first  diffraction  considerably  dampens  the  shock,  and 
some  of  the  energy  will  eventually  be  returned  to  the  shock  wave  in  the  form  of 
compression  waves  generated  by  the  expanding  bubble.  One  goal  of  this  research  is 
to  be  able  to  perform  simulations  of  such  long  term  behavior  that  develop  on  time 
scales  orders  of  magnitude  greater  than  the  shock  diffraction  itself. 

Figure  5.2  shows  the  diffraction  of  an  expanding  underwater  shock  wave 
through  the  water’s  surface.  The  problem  is  initialized  by  placing  at  2  meters  below 
the  water’s  surface  the  center  of  a  10  Kbar  cylindrically  expanding  shock  wave  of 
radius  1  meter.  Inside  the  cylindrical  shock  is  a  bubble  of  hot  dense  air.  The  initial 
conditions  outside  the  expanding  shock  are  ambient  at  one  atmosphere  pressure  and 
normal  temperature.  In  this  simulation,  the  fluids  are  subject  to  a  gravitational 
acceleration  of  Ig.  The  reflected  Prandtl-Meyer  wave  is  untracked.  The  pressure 
contour  plots  show  that  by  6  msec  an  anomalous  reflection  has  developed.  Another 
interesting  feature  of  this  problem  is  the  acceleration  of  the  bubble  inside  the  shock 
wave  by  the  reflected  rarefaction  wave.  This  causes  the  bubble  to  rise  much  faster 
than  it  would  under  just  gravity.  As  the  bubble  reaches  the  surface  it  begins  to 
expand  into  the  atmosphere.  The  expansion  leads  to  the  formation  of  a  kink  in  the 
transmitted  shock  wave  between  the  region  ahead  of  the  surfacing  bubble,  and  the 
rest  of  the  wave.  This  kink  is  aii  untracked  version  of  another  elementary  wave 
(cross  node)  where  two  oblique  shocks  collide. 

The  two  fluids  in  the  cimulations  described  above  are  modeled  by  what  is  called 
a  stiffened  polytropic  equation  of  state  [Harlow  &  Amsden,  1971;  Plohr,  1988], 
where  the  pressure  is  given  by 

F  =  (7  -  l)pE-yP„.  (5.1) 


If  F«  =  0  this  reduces  to  the  more  common  polytropic  equation  of  state.  The  values 
used  for  the  EOS  parameters  are  yair  =  1-4,  ~  0.  Iv/ater  —  7.0, 


’  water 


3000  atm. 
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6.  Conclusion  and  Open  Questions 

The  diffraction  of  a  shock  wave  in  water  through  an  air-water  interface  produces 
an  interesting  wave  that  is  an  analogue  of  a  Mach  reflection  with  a  non-centered  rare¬ 
faction.  A  first  order  description  of  this  wave  can  be  incorporated  into  a  front  track¬ 
ing  algorithm  to  provide  a  high  quality  resolution  of  the  wave  on  a  given  grid.  This 
allows  an  accurate  initialization  of  the  shocked  interface  whose  long  term  unstable 
structure  can  then  be  studied. 

There  are  several  interesting  mathematical  questions  associated  with  the 
anomalous  reflection  wave  discussed  in  this  paper.  One  would  be  to  provide  an 
analytical  asymptotic  description  of  the  interaction  between  the  two  waves.  This 
would  include  decay  estimates  for  the  incident  shock  strength  as  it  is  overtaken  by  the 
Prandtl-Meyer  wave.  A  related  question  would  be  to  provide  a  higher  order  descrip¬ 
tion  of  the  interaction  by  taking  into  account  the  reflected  waves  produced  when  the 
rarefaction  overtakes  the  shock.  This  would  include  an  attempt  to  give  a  detailed 
description  of  the  flow  in  the  immediate  wake  of  the  anomalous  reflection.  This 
wake  region  bounded  by  the  leading  edge  reflected  sound  wave  behind  the  overtake 
node  where  the  Prandtl-Meyer  fan  overtakes  the  shock,  the  anomalous  "Mach"  stem, 
and  the  material  interface  is  analogous  to  the  Mach  bubble  produced  in  ordinary 
Mach  reflection. 

The  author  would  like  to  thank  Dr.  Ralph  Menikoff  for  many  helpful  discus¬ 
sions  of  this  work,  and  Prof.  James  Glimm  for  his  encouragement  and  physical 
insight  into  these  problems. 
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(a)  time  0.0  (xsec 


(b)  time  0.15  (jisec 


10  Ax  =  10  Ay 

Rgure  3.1.  The  collision  of  a  shock  wave  in  water  with  an  air  bubble.  The 
fluids  ahead  of  the  shock  are  at  normal  conditions  of  1  atm.  pressure,  with 
the  density  of  water  1  g/cc  and  air  0.0012  g/cc.  Ibe  pressure  behind  the  in¬ 
cident  shock  is  10  Kbar  with  a  shocked  water  density  of  1.195  g/cc.  Tbe  grid 
is  60x60. 


no 


Figure  3.2.  The  wave  curves  (shock  polars)  used  in  the  solution  of  the 
st^y  state  Riemann  problem  in  figure  3.1.b.  Note  that  the  shock  polar  for 
the  transmitted  shock  in  air  is  much  lower  and  wider  than  the  corresponding 
incident  shock  polar  for  water.  Figure  3.2.b  contains  a  detail  from  the  lower 
section  of  figure  3.2.a  that  shows  the  air  shock  polar  clearly.  The  Mach 
numbers  for  the  incident  and  transmitted  shocks  are  2.7  and  11.4  respective¬ 
ly.  The  pressure  at  the  midstate  solution  is  8.8  bars. 


(a)  time  0.0  ^.sec 


(c)  time  1.05  p,sec  (d)  time  1.55  jisec 


10  Ax  =  10  Ay 


Figure  3.3.  ITie  production  of  an  anomalous  "Mach"  reflection.  A  shock 
wave  with  behind  pressure  of  100  bars  (Mach  number  1.09)  in  water  is  in¬ 
cident  on  a  bubble  of  air.  The  upstream  states  are  ambient  at  1  atm.  and 
standard  densities.  By  time  1  p.sec  the  trailing  edge  of  the  reflected  Prandtl- 
Meyer  wave  has  overtaken  the  incident  shock  produdng  an  analogue  to  an 
ordinary  Mach  reflection  where  the  reflected  wave  is  a  non-centered  rarefac¬ 
tion.  Ilie  grid  is  60x60. 
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(c)  (d) 


0.5Ax  =  0.5Ay 


Figure  4.1.  A  diffraction  node  initially  at  poo  bifurcates  into  an  anomalous 
reflection.  Tlie  predicted  new  node  position  at  po  yields  a  Madi  number  of 
0.984  behind  the  incident  shock.  The  leading  edge  of  the  reflected  Prandtl- 
Meyer  wave  breaks  away  from  the  difflaction  node  to  form  an  overtake  node 
at  pi.  The  propagated  position  of  the  diffraction  node  is  adjusted  by  a  dis¬ 
tance  s*  =  3.59  X  10“^  to  return  the  flow  to  sonic  behind  the  node. 
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(a)  time  0.0  p.sec 


(b)  time  0.15  M.sec 


10  Ax  =  10  Ay 


Rgure  5.1.  Log(l  +  pressure)  contours  for  the  collision  of  a  shock  wave  in 
water  with  an  air  bubble.  The  fluids  ahead  of  the  shock  are  at  normal  condi¬ 
tions  of  1  atm.  pressure,  with  the  density  of  water  1  g/cc  and  air  0.0012  g/cc. 
The  pressure  behind  the  inddent  shock  is  10  Kbar  wi&  a  shocked  water  den¬ 
sity  of  1.195  g/cc.  The  grid  is  60x60. 
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25  Ax  =  25  Ay 


Figure  5.2.  An  underwato:  expanding  shock  wave  diffracting  through  the 
water’s  surface.  An  expanding  shock  wave  with  an  internal  pressure  of  10 
Kbars  and  initial  radius  of  1  meter  is  instaUed  at  a  depth  of  2  meters  below 
the  water’s  surface.  The  external  conditions  are  ambient  at  one  atmosphere 
pressure  and  normal  densities  for  the  air  and  water.  Ihe  boimdary  condi¬ 
tions  are  constant  Dirichlet  at  the  initial  ambient  values.  The  grid  is  150x150. 
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Abstract 

The  Riemann  problem  plays  an  important  role  in  understanding  the  wave  struc¬ 
ture  of  fluid  flow.  It  is  also  a  crucial  step  in  some  numerical  algorithms  for 
accurately  and  efiiciently  computing  fluid  flow;  Godunov  method,  random  choice 
method,  and  front  tracking  method.  The  standard  wave  structure  consists  of 
shock  and  rarefaction  waves.  Due  to  physical  effects  such  as  phase  transitions, 
which  often  are  indistinguishable  from  numerical  errors  in  an  equation  of  state, 
anomalous  waves  may  occur;  “rarefaction  shocks”,  split  waves,  and  composites. 

The  anomalous  waves  may  appear  in  numerical  calculations  as  waves  smeared  out 
by  either  too  much  artiflcial  viscosity  or  insufficient  resolution.  In  addition,  the 
equation  of  state  may  lead  to  instabilities  of  fluid  flow.  Since  these  anomalous 
effects  due  to  the  equation  of  state  occur  for  the  continuum  equations,  they  can 
be  expected  to  occrir  for  all  computational  algorithms.  The  equation  of  state 
may  be  characterized  by  three  dimensionless  variables:  the  adiabatic  exponent 
7,  the  Griineisen  coefficient  F,  and  the  fundamental  derivative  Q.  The  fluid  flow 
zmomalies  occur  when  inequalities  relating  these  variables  are  violated. 

1.  Introduction 

Fluid  dynamics  is  governed  by  a  first-order  hyperbolic  system  of  conservation  laws 
[CouTEint  «Sc  Friedrichs,  1948].  The  wave  structure  in  1-D  is  determined  by  the  Riemann 
problem;  initial  value  problem  with  two  piecewise  constant  states.  In  addition  to  qualita¬ 
tively  imderstanding  fluid  flow,  the  Riemann  problem  is  crucied  to  achieving  high  accuracy 
in  several  numerical  algorithms  for  the  computation  of  fluid  flow;  Godunov  method  [Go- 
dimov,  1959]  and  its  descendants  [Leer,  1979;  Colella  &  Woodward,  1984],  random  choice 
method  [Glimm,  1965;  Chorin,  1976],  and  front  tracking  method  [Chern,  et  a/.,  1985]. 

The  flux  fvmction  for  the  fluid  dynamic  PDFs  depends  on  the  equation  of  state  (EOS). 
This  in  turn  determines  the  wave  cmrve  used  to  solve  the  Riemann  problem.  For  the 
standard  case,  in  which  the  isentropes  are  convex,  the  wave  curve  consists  of  the  usual 
shock  and  rarefaction  waves.  Near  phase  transitions  the  isentropes  may  not  be  convex  and 
anomalous  waves  occur;  “rarefaction  shocks”,  split  waves  and  composites.  Furthermore, 
the  constraints  of  thermodynamics  on  the  EOS  are  not  sufficient  to  obtain  reasonable  fluid 
flow;  imiqueness  and  stability. 

*  Supported  by  the  U.  S.  Department  of  Energy 
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For  numerical  calcvilations  the  EOS  shoiild  be  regarded  as  input  data.  Simulations  of 
fluid  flow  with  any  numerical  algorithm  should  approximate  the  continuum  solution  to  the 
PDEs  for  fluid  dynamics.  Many  numerical  algorithms  tacitly  assume  the  EOS  has  the  stan¬ 
dard  wav’e  structure  emd  may  have  difficulties  for  applications  in  which  phase  transitions 
are  important  and  anomalous  waves  occur.  In  addition,  numerical  errors  in  calculating 
equations  of  state  and  the  use  of  equations  of  state  outside  their  range  of  validity  may  lead 
to  anomalous  behavior  in  situations  when  they  should  not  occur.  Therefore,  numerical 
simulations  may  have  qualitatively  or  physically  incorrect  solutions  as  a  consequence  of 
the  EOS.  Some  numerical  difficulties  may  be  avoided  by  preprocessing  equations  of  state 
to  determine  the  regions  in  state  space  for  which  unphysical  flow  occurs.  Then  for  applica¬ 
tions  in  which  the  flow  enters  these  abnormal  regions  the  EOS  should  be  corrected  rather 
then  artificially  modifying  the  numerical  algorithm  to  compensate  for  deficiencies  in  the 

EOS. 

We  begin  in  §2  by  defining  notation  and  stating  the  fluid  equations.  The  important 
variable  for  characterizing  the  EOS  are  defined.  In  §3  the  theory  described  in  [MenikoIF  & 
Plohr,  1989]  on  the  effect  of  the  EOS  on  the  wave  structure  for  the  PDEs  is  briefly  summa¬ 
rized.  The  numerical  implications  of  the  wave  structure  are  described  in  §4.  Conclusions 
are  stated  in  §5. 

2.  Mathematical  Formulation 
2.1  Fluid  Equations 

We  consider  an  ideal  fluid  in  which  viscosity  and  heat  conduction  may  be  neglected. 
The  fluid  motion  is  governed  by  the  equation  for  the  conservation  of  mass,  momentum  and 
energy.  In  conservation  form  the  fluid  dynamic  equations  are  [Courant  S:  Friedrichs,  1948; 
Landau  &c  Liftshitz,  1959] 

dtQj  -I-  dxfFij  =  0 

where 

q=  [/{rV  F  =  [  pu/-hPI  ] 

[psj  \pSu  +  Pu/ 

and 

p  =  Density, 
u  =  Particle  Velocity, 

S  =  E  +  =  Total  Specific  Energjq 

E  =  Specific  Internal  Energjq 
P  =  Pressure. 

The  material  properties  enter  the  dynamic  equations  through  an  incomplete  equation  of 
state  P(V,E),  where  V  =  1/p  is  the  specific  volume.  The  flux  function  depends  on  the 
EOS.  Consequently,  the  EOS  determines  the  wave  structure  of  fluid  flow. 
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2.2  Equation  of  State 

The  equation  of  state  may  be  characterized  by  the  behavior  of  its  isentropes  in  the 
P-V  plane.  Even  though  the  entropy,  5,  and  temperature,  T,  may  not  be  uniquely  defined 
for  an  incomplete  EOS  an  isentrope  (constant  S)  is  determined  by  the  ODE,  dE  =  —PdV. 
Three  variable  play  an  important  role  in  determining  the  wave  structure. 

1.  The  adiabatic  exponent 

ainPl 


7  =  - 


dlnV 


is  the  negative  slope  of  the  isentrope  in  the  InP-lnF  plane.  It  is  a  dimensionless 
sound  speed,  =  'yPV. 

2.  The  Griineisen  coefficient 

r  =  V  — 
dE  y 

is  a  measure  of  the  spacing  of  the  isentropes  in  the  In  P-ln  V  plane. 

3.  The  Fundamental  Derivative 


^  ’  dP/dv\s 


is  a  measure  of  the  convexity  of  the  isentropes  in  the  In  P-ln  V  plane. 

Thermodynamic  stability  requires  that  7  >  0.  This  implies  the  fluid  equations  are 
hyperbolic.  The  standard  case  assumes  the  isentropes  eire  convex,  ^  >  0.  When  T  >  0  the 
isentropes  do  not  intersect. 


3.  Wave  Curve 

The  wave  curve  is  the  locus  of  states  that  may  be  joined  to  a  fixed  initieil  state  by  a 
scale-invariant  solution  of  the  PDEs.  For  fluid  dynamics  the  wave  curve  consists  of  two 
connected  branches  corresponding  to  left  and  right  facing  (sound)  waves.  There  is  also  a 
linear  degenerate  contact  wave  which  follows  the  particle  trajectories.  The  solution  to  the 
Riemann  problem  in  1-D  is  determined  by  the  intersection  in  the  P-u  plane  of  the  left 
wave  curve  from  the  left  state  and  the  right  wave  curv^e  from  the  right  state. 

In  general,  scale-invariant  solutions  are  composites  of  two  types  of  elementary  wave; 
continuous  solutions  called  rarefactions  or  simple  waves,  and  discontinuous  solutions  called 
shock  waves.  Shocks  are  determined  by  algebraic  equations,  the  Hugoniot  jump  conditions. 
Simple  waves  correspond  to  isentropes  emd  are  determined  by  integrating  an  ODE  along 
a  characteristic.  For  consistency  it  is  importeint  that  the  characteristic  velocity,  u  db  c,  be 
monotonic.  The  variation  of  the  Lagrangian  wave  speed  pc  is  given  by 

^  dpc 
cdp  s 

When  Q  >  0  the  sound  modes  are  linearly  non-degenerate.  Compressive  waves  steepen  to 
form  shocks  and  expansive  waves  spread  out  to  form  rarefactions.  Wbt  a  ^  <  0  the  nature 
of  the  waves  reverses;  shock  waves  are  expansive  and  “rarefaction  waves”  are  compressive. 
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Finally,  when  Q  changes  sign,  sceile-invariant  solutions  include  composite  waves  consisting 
of  rarefactions  and  sonic  shocks  propagating  as  a  single  entity. 

3.1  Standard  Case 

The  standard  case  assumes  ^  >  0  and 


'Y  H - ^  F  • 

^  2E  - 


Here  we  2issume  the  origin  of  energy  is  chosen  such  that  E  —*  0  as  V  oo.  The  wave 
curve  satisfying  the  entropy  condition  consists  of  rarefactions  in  expansion  and  shocks  in 
compression.  The  second  condition  implies  the  wave  crurve  in  the  P-u  plane  is  monotonic. 
It  is  needed  to  ensure  the  Riemann  problem  hcis  a  unique  solution  [Smith,  1979]. 

3.2  Anomalous  Waves 

When  a  material  undergoes  a  phase  transition  there  is  a  jump  in  the  sound  speed, 
Cmixed  <  Cpure-  As  a  result  when  an  isentrope  in  the  P-V  plane  passes  through  the  satu¬ 
ration  boundary  it  suffers  a  kink,  a  discontinuity  in  the  slope.  Consequently,  Q  contains 
a  ^-function  singularity.  Depending  on  the  sign  of  the  kink  the  isentrope  may  become 
non-convex.  When  the  isentrope  is  non-convex,  some  single  shocks  are  unstable  and  split 
into  2  shocks  of  the  same  family.  Similarly,  a  convex  kink  results  in  split  rarefactions. 

For  some  materials,  near  a  phase  transition  ^  <  0  and  smooth.  In  this  case  the  wave 
speed  is  not  monotonic  and  some  single  shocks  and  rarefactions  are  unstable  resulting 
in  composite  waves.  Replacing  unstable  shocks  and  rarefactions  with  split  waves  and 
composites  results  in  a  imique  continuous  wave  curve  [Wendroff,  1972).  This  corresponds 
to  the  extended  entropy  condition  of  Liu  [1975]  and  Oleinik  [1959].  The  wave  curve  of 
stable  scale-invariant  states  may  then  be  used  with  the  standard  construction  to  solve  the 
Riemann  problem. 

3.3  Shock  Instability 

In  2-D  a  new  mode  of  instability  is  possible  [Fowles,  1981;  Kontorovich,  1957;  Majda 
&  Rosales,  1983].  W*hen  the  condition 


7  >  r-f-1 

is  violated  some  shocks  are  unstable  to  the  development  of  transverse  waves.  This  is 
qualitatively  similar  to  what  is  observed  experimentally  for  detonations  [Fickett  &:  Davis, 
1979].  Another  result  is  that  on  the  shock  polar,  P  at  the  sonic  point  is  greater  than  P 
at  the  maximum  turning  angle.  As  a  consequence  an  imusual  wave  pattern  is  possible,  a 
Mach  configuration  with  a  reflected  rarefaction.  On  the  other  hand  when  shocks  are  2-D 
stable  the  1-D  Riemann  problem  has  a  unique  solution. 

4.  Numerical  Implications 

The  equation  of  state  is  input  data  for  numerical  calculations.  It  is  not  surprising  that 
an  incorrect  EOS  will  result  in  an  unphysicaJ  simulation  of  the  fluid  flow.  Two  important 
question  are:  1.  Given  an  EOS  how  to  determine  if  it  is  physically  reasonable;  and  2.  What 
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are  the  symptoms  in  a  numerical  simulation  of  cin  unphysical  EOS.  One  of  the  interesting 
cispects  of  these  problems  is  that  in  a  simulation  the  fluid  flow  samples  only  a  region  of  state 
space.  Thus  an  EOS  may  be  adequate  for  one  application  and  imacceptable  for  another. 
In  general  it  is  important  to  determine  the  domain  of  validity  of  an  EOS. 

For  applications  with  real  materials  the  correct  EOS  may  be  experimentedly  deter¬ 
mined.  However,  accurate  EOS  experiments  are  difficult  to  perform  and  expensive.  Data 
is  available  in  only  a  limited  range  of  state  space.  As  a  consequence  for  an  EOS  one  resorts 
to  extrapolating  data,  semi-empirical  fits,  and  theoretical  models.  A  minimal  requirement 
on  any  EOS  is  thermodynamic  consistency  and  stability.  This  is  difficult  to  check  for  only 
the  incomplete  part  of  the  equations  of  state  needed  to  simulate  ideal  fluid  flow.  It  is 
also  insufficient  to  guarantee  the  behavior  of  fluid  flow  observed  experimentally.  Simple 
conditions  on  the  incomplete  EOS  are  based  on  obtaining  the  correct  wave  structure  and 
shock  stability.  This  at  least  is  sufficient  to  obtain  qualitatively  correct  fluid  flow,  though 
the  simulation  may  still  be  quantitatively  inaccurate. 

Similar  difficulties  may  occur  when  the  domain  of  an  equation  of  state  is  extended  by 
extrapolating  empirical  fits  outside  their  range  of  validity  or  patching  together  theoretical 
models.  Several  conditions  on  the  EOS  should  be  checked.  Consider  the  isentropes  in  the 
P-V  plane.  (1)  If  the  slope  of  an  isentrope  becomes  positive  then  <  0  and  the  fluid 
equations  axe  no  longer  hyperbolic.  This  implies  the  adiabatic  compressibility  is  negative 
and  results  in  numerical  instabilities;  compressing  a  fluid  element  lowers  its  pressure  which 
causes  the  fluid  element  to  further  compress  until  either  the  flow  leaves  the  elliptic  region 
or  state  space  or  catastrophic  failure  occmrs.  (2)  If  the  slopes  of  the  isentropes  have 
kinks,  which  are  physically  indistinguishable  from  a  phase  transition,  then  split  shocks  and 
rarefactions  may  develop.  (3)  If  the  isentropes  are  not  convex  then  the  extended  entropy 
needs  to  be  imposed.  Because  of  the  non-imiqueness  different  numerical  algorithms  may 
give  different  solution  depending  for  exeunple  on  the  form  of  artificial  viscosity  used  to 
impose  the  entropy  condition.  In  addition,  “rarefaction  shocks”  are  possible.  Algorithms 
which  rely  on  artificial  viscosity  to  smear  out  shocks  but  only  use  it  in  compression  will  be 
unstable.  Composites  consisting  of  sonic  shocks  followed  by  compression  waves  which  do 
not  steepen  because  ^  <  0  may  have  the  appearance  of  shocks  smeared  out  by  too  much 
artificial  viscosity  or  too  little  mesh  resolution.  (4)  If  >  0  and  ^  >  0  but  7  -|-  >  F  is 

violated  then  1-D  fluid  dynamics  does  not  have  a  unique  solution.  Numerical  calculations 
may  not  converge  under  mesh  refinement  or  be  unusually  sensitive  to  initial  data.  (5)  If 
7  ^  r  -h  1  is  violated  then  shocks  are  2-D  unstable.  In  numerical  calculations  transverse 
waves  would  develop  along  shock  waves.  In  extreme  cases  diverging  shocks  speed  up  and 
converging  shocks  slow  down  resulting  in  a  ripple  instability  of  shock  waves.  (6)  If  27  >  F 
is  violated  then  the  shock  Hugoniot  may  have  disconnected  branches  giving  rise  to  further 
non-vmiqueness  of  fluid  flow. 

It  should  be  emphasised  that  even  simple  analytic  seemingly  reasonable  EOS  can  give 
rise  to  anomalous  and  unphysical  fluid  flow.  One  example  is  analyzed  in  detail  by  [Menikoff 
k  Plohr,  1989].  This  consists  of  a  Giiineisen  EOS  with  constant  F  and  a  linear  u,-Up  fit 
for  the  principle  shock  Hugoniot. 

Even  when  anomalous  waves  do  not  occur  other  aspects  of  the  EOS  may  be  important. 
Some  numerical  algorithms  use  approximate  Riemann  solvers  for  efficiency.  Typically  ap- 
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proximate  Riemann  solvers  tacitly  assume  the  isentropes  are  convex  and  7  is  slowly  varying, 
or  the  wave  curve  in  the  P-u  plane  is  convex.  Conditions  on  the  EOS  for  monotonicity 
of  thermodynamic  variables  tilong  the  wave  curve  have  been  worked  out  in  [MenikofF  & 
Plohr,  1989]. 

5.  Conclusions 

Equations  of  state  have  the  potential  for  causing  difficulties  in  numerical  simulations  of 
fluid  flow.  This  may  take  the  form  of  qualitatively  incorrect  wave  structure  or  instabilities 
of  shock  waves.  When  non-convexity  of  an  isentrope  ^  <  0  is  physical,  such  as  may  occur 
near  a  phase  transition,  the  numerical  algorithm  must  be  capable  of  producing  “rarefaction 
shocks”,  and  the  stable  split  or  composite  waves  rather  than  the  unstable  elementary  waves. 
Problems  due  to  physically  inadequate  EOS  or  numerical  errors  in  EOS  can  be  avoided  by 
preprocessing  the  equation  of  state  to  determine  the  regions  of  state  space  with  anomzilous 
properties.  This  entails  checking  when  one  of  the  inequalities  >  0,  ^  >  0,  or  7  >  F  -f  1 
is  violated.  If  in  a  simulation  the  flow  enters  an  anomalous  region  of  state  space  then  the 
EOS  needs  to  be  corrected. 

There  are  several  importzmt  open  questions.  Analysis  of  numerical  algorithms  have 
assumed  the  fluid  equations  are  genuinely  non-linear  with  a  unique  solution  determined  by 
an  entropy  condition,  see  e.g.,  [Harten,  et  al.,  1983].  The  analysis  needs  to  be  generalized  to 
the  non-convex  case.  Some  algorithms  (for  example  typical  Godunov  method)  for  efficiency 
use  approximate  Riemann  solvers.  This  is  reasonable  when  the  method  averages  over 
a  cell  and  does  not  use  the  full  detail  of  the  Riemann  solution.  The  properties  of  an 
approximate  Riemann  solver  and  the  artificial  viscosity  required  to  obtain  the  correct 
wave  structure  for  general  EOS  needs  further  study.  For  other  algorithms,  such  as  the 
front  tracking  method,  which  requires  a  good  Riemann  solver  either  better  equations  of 
state  with  accurate  derivatives  or  robust  methods  of  utilizing  the  EOS  Eire  needed. 
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ABSTRACT.  There  is  a  signiftcant  level  of  interest  in  the  analytical  and  numerical  mod¬ 
eling  of  lower-frequency  atmospheric  acoustic  propagation  in  battlefield  environments.  Ray- 
based  models,  because  of  their  frequency  limitations,  do  not  always  give  an  adequate  pre¬ 
diction  of  quantities  such  as  sound  pressure  or  intensity  levels.  However,  the  parabolic 
approximation  method,  widely  used  in  ocean  acoustics,  and  often  more  accurate  than  ray 
models  for  frequencies  of  interest,  can  be  applied  to  acoustic  propagation  in  the  atmosphere. 
We  discuss  appropriate  physical  and  asymptotic  conditions  under  which  this  model  is  valid. 
Modifications  of  an  existing  implicit  finite-difference  implementation  for  computing  solutions 
to  the  parabolic  approximation  are  discussed.  In  addition,  we  present  calculations  of  acoustic 
intensity  levels  in  a  windy  atmosphere  and  contrast  the  results  with  those  of  ray  theory. 


1.  INTRODUCTION.  The  propagation  of  low  frequency  sound  through  the  eairth’s  at¬ 
mosphere  over  long  distances  is  a  problem  with  numerous  applications.  In  many  instances, 
acoustic  propagation  occurs  in  environments  which  may  be  characterized  by  winds,  atmo¬ 
spheric  turbulence,  extremes  of  weather,  and  other  natural  and  man-made  atmospheric  vrui- 
ations,  as  well  as  irregular  topography  and  terrain  structure.  These  environmental  variations 
are  typically  range-  as  well  as  height- dependent,  and  can  profoundly  affect  the  behavior  of 
sound  waves.  Geometrical  acoustics,  or  classical  ray  theory,  is  one  approach  that  has  been 
commonly  used  to  study  atmospheric  acoustics.  Unfortunately,  the  approximations  under 
which  the  ray  equations  hold  are  valid  only  for  sufficiently  high  source  frequencies.  At  lower 
frequencies  where  diffraction  effects  are  especially  important,  the  use  of  other  mathematical 
models  can  provide  more  accurate  and  useful  results. 


2.  PARABOLIC  APPROXIMATION  METHOD.  An  alternative  approach  to  low- 
frequency  propagation  modeling  is  known  as  the  parabolic  approximation  method  (PAM). 
This  method,  originally  developed  for  studies  of  tropospheric  radio  wave  propagation,^  ex¬ 
ploits  characteristic  features  of  the  propagation  medium  associated  with  the  formation  of  a 
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waveguide.  Atmospheric  acoustic  waveguides  can  be  formed  by  certain  meteorologiced  con¬ 
ditions  either  with  or  without  boundary  interaction.  Within  such  a  waveguide,  sound  waves 
may  propagate  to  relatively  large  distances  with  significant  amplitudes.  The  parabolic  ap¬ 
proximation  has  been  successfully  applied  to  a  broad  variety  of  problems  in  ocean  acoustics,* 
where  many  features  occur  that  are  analogous  to  those  in  atmospheric  acoustics. 

Let  p(r,  z )  be  the  acoustic  pressure  caused  by  the  presence  of  a  point  source  in  a  stratified, 
moving  atmosphere,  where  r  and  z  denote  the  range  and  height  in  cylindrical  coordinates. 
We  will  confine  our  attention  to  a  vertical  plane  containing  the  source,  and  parallel  to  the 
wind  motion.  In  addition,  we  assume  that  the  the  sound  speed  is  independent  of  azimuth. 
We  consequently  deal  with  two-dimensional  sound  propagation.  The  time-independent  wave 
field,  denoted  as  A,  is  obtained  by  assuming  that  the  source  is  harmonic  with  frequency  /, 
so  that  p  =  .4  exp(2iri/f ).  It  can  be  shown  that  A  satisfies  the  reduced  wave  equation 


V^A  -f  klnW  +  -  2i^^A„  =  0, 

Co  Cq  dz 


(1) 


where  co  is  a  reference  sound  speed,  ko  =  27r//co  is  a  reference  wave  number,  c(r,z)  is  the 
sound  speed,  Ti(r, 2)  =  co/c{r,z)  is  the  index  of  refraction,  and  Uo(2)  is  the  wind  speed. 
Furthermore,  it  can  be  shown  that  away  from  the  source,  the  quantity  A  takes  on  the 
asymptotic  form 

gifcnr 

(2) 

Equation  (2)  is  an  essential  feature  of  the  parabolic  approximation  when  the  quantity  rj)  is 
related  to  the  slow-scale  (i.e.  many  wavelengths)  variation  in  the  acoustic  pressure.  Fur¬ 
thermore,  through  careful  scaling  and  asymptotic  arguments,  it  can  also  be  shown  that  ip 
satisfies  a  family  of  paraboUc  equations  (PEs).  Details  of  the  complete  derivation  of  this 
family  of  PEs  in  an  inhomogeneous  moving  medium  can  be  found  in  Refs.  3  and  4.  For  the 
numerical  examples  considered  in  the  next  section,  the  appropriate  member  of  this  family  is 
given  by 

2ikoipr  +  ipzz  +  kl{h^  -  l)ip  =  0,  (3) 

where 

n  =  co/c,  (4) 


with 


c  =  c  +  Uo. 


(5) 


The  quantity  c  is  is  called  the  effective  sound  speed  profile  (ESSP). 

Several  numerical  algorithms  for  solving  Eq.  3  have  been  found  useful  and  are  widely 
implemented.  One  employs  implicit  finite  differences,®  using  a  Crank-Nicolson  scheme  to 
march  the  solution  forward  in  range.  This  method  is  well-suited  for  many  propagation 
situations,  for  example,  those  involving  irregularly-shaped  boundaries.  From  this  algorithm, 

gifior 

we  determine  ip,  then  p{t,z)  =  ip{r,z) — is  the  complex- valued  pressure  field,  and  finally 
relative  intensity  /,  defined  as 


1  =  20  log  10 


p{r,z)\ 

I  J 

Pref  I 


(6) 
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Figure  1:  An  atmospheric  sound  channel. 

where  p^f  is  the  pressure  at  1  m  from  the  source.  The  quantity  1  will  be  computed  and 
discussed  in  the  next  section. 

3.  NUMERICAL  EXAMPLES  Figure  1  depicts  an  idealized  atmospheric  acoiTstic 
waveguide.  We  note  here  that  this  waveguide  is  similar  to  one  used  as  a  model  in  Ref.  6, 
a  study  of  the  downwind  propagation  of  low  frequency  noise  from  a  wind  turbine  at  a  test 
site  in  Wyoming.  A  cw  sound  source  is  located  40  m  above  a  horizontal,  perfectly- reflecting 
ground  surface.  The  air  is  assumed  to  be  isospeed  with  cq  =  330  ms~^  The  atmosphere 
moves  within  the  indicated  plane  with  a  logarithmic  velocity  profile,  a  modeling  rissumption 
often  used  for  the  vertical  structure  of  winds: 

uo  =  Kvfln  (^1  +  ,  (7) 

where  K  =  2.5,  v/  =  0.64  ms“^,  and  Zo  =  0.1  m.  As  shown,  the  channel  is  bounded  above 
by  a  horizontal,  artificial,  pressure- release  surface  of  height  h,  beneath  which  is  an  artificial 
absorbing  layer.  This  absorbing  layer  is  designed  to  eliminate  reflections  that  would  otherwise 
occur  from  the  pressure-release  surface  at  the  top  of  the  waveguide.  This  combination 
boundary  model  is  widely  used  to  simulate  bottom  boundaries  in  ocean  acoustics,  and, 
modified  by  us  to  function  as  a  surface  model,  is  a  feature  of  the  numerical  implementation 
which  we  use  for  our  calculations. 

A  detailed  picture  of  the  sound  field  from  a  source  with  frequency  10  Hz  when  no  wind 
is  present  can  be  seen  in  Fig.  2.  Relative  intensity  is  displayed  as  level  curves  between  a 
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Figure  2:  Level  curves  of  relative  intensity  in  the  r~z  plane.  No  wind  is  present. 

rigid  ground  and  an  artificial  surface  at  height  1000  m.  The  maximum  range  shown  is  10 
km.  The  artificial  absorbing  layer  is  500  m  thick,  beginning  at  height  500  m  and  extending 
vertically  to  the  artificial  surface.  The  intensity  within  the  artificial  absorbing  layer  is  shown 
for  completeness,  but  we  emphasize  that  the  significant  portion  of  the  solution  is  located 
near  the  vicinity  of  the  ground  surface.  Note  the  regular  way  in  which  intensity  decreases  for 
increasing  range.  It  can  be  shown  that  this  corresponds  to  spherical  spreading  of  the  sound, 
which  is  entirely  expected  from  a  point  source  in  a  homogeneous  motionless  atmosphere. 

In  the  next  illustration,  a  logarithmic  wind  profile  is  present  with  a  maximum  wind  speed 
of  14  ms“^.  The  level  curves  of  intensity  downwind  from  the  source  are  shown  in  Fig.  3.  The 
boundary  locations  and  layer  thickness  is  as  in  Fig.  2.  Note  that  a  substantial  change  in  the 
intensity  pattern  has  occurred  ne2ir  the  ground  surface.  Intensity  is  seen  to  decrease  much 
more  slowly  with  increasing  range.  In  fact,  this  intensity  pattern  can  be  shown  to  correspond 
roughly  to  cylindricsd  spreading  of  the  sound.  This  effect  is  caused  by  the  direction  (and 
magnitude)  of  the  wind.  For  sufficiently  high  frequencies,  geometrical  acoustics  predicts 
that  ray  paths  would  be  curved  toward  the  ground,  with  many  rays  repeatedly  striking  the 
ground.  At  the  source  frequency  of  10  Hz  used  here,  ray  theory  is  no  longer  applicable. 
Nonetheless,  the  wind  stiU  serves  to  focus  the  sound  near  the  ground,  so  that  the  intensity 
there  is  substantially  larger  than  that  in  the  no-wind  case  discussed  above. 

When  the  wind  direction  is  reversed,  an  entirely  different  result  is  encountered.  Figure  4 
depicts  level  curves  of  intensity  for  the  channel  region  upwind  from  the  source.  Note  that  the 
intensity  decreases  more  rapidly  near  the  ground  surface  for  increasing  range  when  compared 
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Figure  4:  Level  curves  of  relative  intensity  in  the  r-z  plane  upwind  from  the  source. 
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Figure  5:  Relative  intensity  I  versus  range  r  at  a  receiver  on  the  ground.  No  wind  is  present. 

to  the  no-wind  case  shown  in  Fig.  2.  This  decrease  is  stronger  than  spherical  spreading.  In 
fact,  geometrical  acoustics  predicts  the  existence  of  a  shadow  zone  (a  “zone  of  silence”) 
beginning  immediately  upwind  from  the  source.  As  we  noted  before,  the  ray  model  is  not 
applicable  at  our  source  frequency.  In  fact,  sound  energy  can  diffract  into  the  shadow  zone,  so 
that  some  sound  can  be  detected  at  the  ground.  Naturally,  the  intensity  tends  to  be  reduced 
when  compared  to  the  no-wind  channel.  We  also  note  that  this  effect  at  low  frequencies  has 
been  detected  experimentally,  ®  suggesting  that  a  low-frequency  propagation  model  such  sis 
the  parabolic  approximation  can  be  used  to  interpret  experimentsd  data. 

Next,  we  compare  calculations  done  at  two  frequencies  and  two  wind  conditions.  Figure  5 
displays  relative  intensity  versus  range  for  four  different  cases.  The  receiver  is  located  on 
the  ground  surface.  First,  note  the  calculation  for  the  10  Hz  no-wind  case  and  the  10  Hz 
downwind  case.  These  two  curves  illustrate  clearly  the  differences  than  can  occur  in  the 
presence  of  a  wind.  Even  more  striking  is  a  comparison  between  calculations  performed 
with  and  without  wind  but  at  a  source  frequency  of  100  Hz.  At  this  higher  frequency, 
relative  intensity  would  be  expected  to  exliibit  more  “ray-like”  behavior.  In  the  absence  of 
a  wind,  the  high  frequency  curve  is  very  nearly  identical  to  the  low  frequency  curve.  At  100 
Hz,  the  downwind  curve  is  seen  to  exhibit  a  strong  oscillatory  behavior,  a  consequence  of 
interference  effects  resulting  from  multi-mode  propagation.  This  does  not  occur  in  the  low 
frequency  case,  which  exhibits  no  interference  pattern  at  all,  again  suggesting  the  predictive 
power  of  a  low-frequency  propagation  model  such  as  the  parabolic  approximation. 
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4.  SUMMARY.  We  have  briefly  described  the  utihty  of  the  parabolic  approximation  for 
predicting  the  relative  intensity  of  sound  propagating  in  a  moving  atmosphere.  After  sketch¬ 
ing  the  development  of  this  model,  we  applied  it  to  an  idealized  atmospheric  waveguide 
containing  a  steady  height- dependent  wind.  We  found  that  the  parabolic  approximation 
yields  computed  acoustic  fields  for  low-frequency  sources  which  qualitatively  differ  from 
those  predicted  by  ray  theory.  We  emphasize  that  significantly  more  comphcated  environ¬ 
ments,  which  might  include  irregular  ground  topography,  penetrable  ground  surfaces,  and 
range- depen  dent  sound  speed  conditions  in  the  medium,  can  be  handled  by  this  numerical 
implementation. 
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Abstract 

Viscoelastic  materials  with  fading  memory,  e.g.,  polymers,  suspensions,  and  emul¬ 
sions,  exhibit  behavior  that  is  intermediate  between  the  nonlinear  hyperbolic  re¬ 
sponse  of  purely  elastic  materials  and  the  strongly  diffusive,  parabolic  response  of 
viscous  fluids.  Many  popular  numerical  methods  used  in  the  computation  of  (sup¬ 
posedly  steady)  viscoelastic  fluid  flows  appear  to  fail  in  physically  relevant  regions 
of  parameter  space  and  thus  do  not  capture  important  phenomena.  It  is  found 
that  a  key  to  a  satisfactory  explanation  of  significant  non-Newtonian  phenomena 
is  to  study  the  fully  dynamic  governing  system  of  equations.  We  present  results 
obtained  using  three  classes  of  numerical  methods  that  accurately  represent  the 
dynamics,  and  we  discuss  analytical  results  for  related  models.  We  reproduce 
experimental  results  on  non-Newtonian  “spurt”  for  shearing  flow  through  a  slit 
die  and  other  related  phenomena  associated  with  the  non-monotone  constitutive 
relation  of  the  shear-stress  vs.  shear  strain-rate.  We  conclude  that  c:r  results 
provide  a  physically  reasonable  explanation  of  spurt,  hysteresis,  and  shape  mem¬ 
ory.  Moreover,  experiments  are  suggested  to  verify  our  approach. 

1.  Introduction 

Viscoelastic  materials  with  fading  memory,  e.g.,  polymers,  suspensions,  and  emulsions, 
exhibit  behavior  that  is  intermediate  between  the  nonlinear  hyperbolic  response  of  purely 
elastic  materials  and  the  strongly  diffusive,  parabolic  response  of  viscous  fluids.  They 
incorporate  a  subtle  dissipative  mechanism  induced  by  effects  of  the  fading  memory.  The 
understanding  of  the  equations  of  motion  coupled  with  various  constitutive  assumptions 
at  the  mathematical  level  is  crucial  for  modeling,  design  of  algorithms  and  computation 
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of  pai'ticular  problems.  Shear  flows  of  viscoelastic  fluids  exhibit  a  variety  of  interesting 
physical  phenomena  of  importance,  for  example,  in  polymer  processing.  We  have  been 
intrigued  by  the  fact  that  many  numerical  methods  used  in  the  computation  of  (supposedly 
steady)  viscoelastic  fluid  flows  appear  to  fail  in  physically  relevant  regions  of  parameter 
space  and  thus  do  not  capture  important  phenomena.  One  such  phenomenon  is  “spurt,” 
the  occurrence  of  which  in  sheaj  flows  of  non-Newtonian  fluids  through  a  capillary’  has 
been  confirmed  by  careful  experiments  (Vinogradov,  et  al.  [IS]).  The  understanding  of 
this  and  related  phenomena  has  proved  to  be  of  surprising  physical,  mathematical,  and 
computational  interest. 

Our  goals  in  this  study  are: 

1.  To  understand  the  physical  model:  How  do  the  computed  solutions  correspond  to  the 
molecular  or  continuum  model  on  which  they  are  based?  Can  the  character  of  these 
solutions  serve  to  validate  the  physical  model  or  suggest  improvements  in  it? 

2.  To  understand  the  physical  consequences  of  the  model:  Do  the  solutions  obtained 
make  physical  sense?  Do  solutions  that  have  mathematically  interesting  character 
correspond  to  observed  phenomena?  Do  they  predict  behavior  that  should  be  studied 
in  the  laboratory?  What  solutions  to  the  problem  are  relevant  to  processing  and 
design? 

3.  To  understand  qualitative  properties  of  the  mathematical  model:  the  global  existence 
and  uniqueness  of  solutions,  dependence  on  data,  regularity  and  asymptotic  behavior 
of  solutions  for  large  time,  approach  to  steady  states,  etc. 

4.  To  design  numerical  methods  that  account  for  the  mathematics  and  reproduce  the 
physics. 

The  outline  of  this  paper  is  as  follows:  In  §2,  we  discuss  the  modelling  of  spurt  and 
related  physical  phenomena  in  capillary  flow  as  a  fully  time-dependent  one- dimensional 
flow  through  a  slit  die,  using  the  Johnson-Segalman  differential  constitutive  relation.  In 
§3,  we  derive  a  one-dimensional  initial  boundary-value  problem  for  shearing  flows  through 
a  slit  die,  starting  with  the  3-D  equations.  In  §4,  we  present  mathematical  results  for  the 
governing  system  and  for  related  model  problems  that  capture  some  of  the  key  phenomena. 
In  §5,  we  describe  three  numerical  methods  and  present  a  variety  of  results  of  physical 
interest  obtained  using  them,  including  a  comparison  with  experimental  data  for  the  spurt 
phenomenon.  In  §6.  we  discuss  our  conclusions. 

2.  Physical  Phenomena 

Interesting  phenomena  have  been  observed  by  Vinogradov,  et  al.  [IS]  in  the  flow  of 
viscoelastic  fluids  (monodisperse  polyisoprenes)  through  capillaries.  They  found  that  the 
volumetric  flow  rate  increased  dramatically  at  a  critical  stress  that  was  independent  of 
molecular  weight.  This  phenomenon,  which  is  called  “spurt”,  had  been  overlooked  or 
dismissed  by  rheologists  because  no  plausible  mechanism  to  explain  it  in  the  context  of 
steady  flows  was  known.  Spurt  was  lumped  together  with  instabilities  such  as  “slip.” 
“apparent  slip.”  and  “melt  fracture.”  which  are  poorly  understood.  While  regarded  as 
anomalous,  these  instabilities  can  severely  disrupt  polymer  processes:  they  can  be  avoided 
in  practice  only  with  ad  hoc  engineering  expedients.  The  mechanisms  of  such  phenomena 
are  not  understood  primarily  because  the  governing  equations  are  analytically  intractable 
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and  because  popular  numerical  methods  for  steady  flows  fail  to  capture  these  dramatic 
non-Newtonian  effects. 

Several  explanations  have  been  offered  for  the  spurt  phenomenon  [2,  4,  9,  12].  Their 
common  feature  is  that  the  shear  stress  in  steady  flow  does  not  vary  monotonically  with 
shear  strain  rate,  (as  illustrated  in  Fig.  2,  below).  These  explanations  have  been  rejected 
by  many  rheologists  as  being  somehow  unphysical.  We  believe  that  this  criticism  is  un¬ 
founded  because  it  is  based  on  intuition  derived  from  generalized  Newtonian  models  of 
non- Newtonian  fluids. 

A  key  to  satisfactory  explanation  of  the  spimt  phenomenon  is  the  dynamical  behav¬ 
ior  of  the  governing  equations.  While  there  is  a  great  variety  of  constitutive  models  for 
viscoelastic  fluids,  the  dynamical  behavior  for  many  is  inaccessible.  In  this  paper,  we 
model  the  spurt  phenomenon  using  the  Johnson-Segalman  model  [7]  as  constitutive  rela¬ 
tion.  The  latter  correctly  models  the  spurt  phenomenon,  and  yet  is  sufficiently  simple  to 
be  understood  through  a  combination  of  analysis,  asymptotics,  and  numerical  simulation. 

We  study  idealized  flow  through  a  narrow  slit  die.  Assuming  that  the  driving  pressure 
is  transmitted  instantaneously,  the  three-dimensional  flow  may  be  approximated  by  a  one¬ 
dimensional  problem.  Our  analytical  and  numerical  results  show  that  flow  in  a  slit  die 
reflects  the  essential  features  observed  for  capillaries.  We  believe  that  this  is  because 
the  spurt  phenomenon  depends  solely  on  material  properties  and  the  smallest  physical 
dimension  of  the  problem. 

A  non-monotone  stress-strain-rate  relation  of  the  kind  that  causes  the  spurt  phe¬ 
nomenon  arises  when  the  fluid  behavior  is  characterized  by  multiple  relaxation  times. 
Interpretation  of  small-amplitude  oscillatory  shear  data  [18]  indicates  that  the  relaxation- 
times  are  widely  spaced.  Formal  asymptotic  analysis  [10]  of  the  dynamics  shows  that  the 
effects  of  the  smallest  relaxation  time  are  mimicked  by  a  Newtonian  viscosity  term.  For 
simplicity,  we  study  the  Johnson-Segalman  model  with  a  single  relaxation  time  and  added 
Newtonian  viscosity. 


3.  Mathematical  Formulation 

The  motion  of  a  fluid  under  incompressible  and  isothermal  conditions  is  governed  by 
the  balance  of  linear  momentum 


P 


'dw 

dt 


+  V  •  Vv 


VS. 


(3.1) 


Here,  p  is  the  fluid  density,  v  is  the  particle  velocity,  and  S  is  the  stress  tensor.  The 
response  characteristics  of  the  fluid  are  embodied  in  the  constitutive  relation  for  the  stress. 
For  viscoelastic  fluids  with  fading  memor\%  these  relations  specify  the  stress  as  a  functional 
of  the  deformation  history  of  the  fluid.  Many  sophisticated  constitutive  models  have  been 
devised;  see  Ref.  [1]  for  a  survey.  In  the  present  work,  we  focus  on  the  Johnson-Segalman 
model  [7]  as  a  prototype  for  general  constitutive  models.  This  model  accounts  for  non- 
affine  deformation  of  Gaussian  networks  by  introducing  a  slip  parameter  a.  —  1  <  a  <  1. 
leading  to  a  nonlinear  generalization  of  the  classical  Maxwell  model. 

To  specify  this  constitutive  relation,  we  decompose  the  stress  as 


S  =  —pi  4-  2;7D  -!-  T  . 


(3.21 
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In  this  equation,  p  is  an  isotropic  pressure  (which  is  determined  from  the  incompressibility 
constraint),  rj  is  the  coefficient  of  Newtonian  viscosity,  and  E  is  the  non-Newtonian  extra 
stress.  Also,  we  let  D  ;=  i  [Vv  -f  (Vv  )^]  and  Q  :=  j  [Vv  —  ( Vv  be  the  symmetric 
and  antisymmetric  parts  of  the  velocity  gradient  Vv,  which  has  components  (Vv)*j  := 
du'/dx^.  The  extra  stress  is  specified  by  the  differential  constitutive  law 

E  =  2^D  -  AE  ,  (3.3) 

where 

i:  ^  -1-  V  •  VE  -f  E[n  -  aD]  +  [Q-  aDj^E  (3.4) 

t/ 1 

is  the  objective  time  derivative  of  E  with  parameter  a.  The  parameter  p,  is  an  elastic  shear 
modulus,  and  A  is  a  relaxation  rate. 

Constitutive  relations  ruch  as  Eq.  3.3  exhibit  a  mixture  of  elastic  and  viscous  behr,-.  ior. 
This  may  be  seen  heuristicaily  as  follows.  In  the  long  relaxation-time  limit,  A  — +  0,  Eq.  3.3 

shows  that  an  objective  time  derivative  of  E  is  proportional  to  the  deformation  rate: 

* 

E  ~  2juD.  Thi.s  is  characteristic  of  elastic  behavior,  and  leads  to  the  interpretation  of  p 
as  a  shear  modulus.  By  contrast,  when  X,  p  —*  oo  with  p/X  fixed,  E  ~  2{p/X)'D:  thus,  the 
model  displays  viscous  behavior  with  p/X  being  the  Newtonian  shear  viscosity  coefficient. 

Essential  properties  of  the  constitutive  relation  are  exhibited  in  simple  planar  shear 
flow.  With  the  flow  aligned  along  the  y-accis  (see  Fig.  1),  the  flow  variables  are  independent 
of  y.  Therefore,  the  velocity  field  is  v  =  (0,v(a:,t)),  and  the  balance  of  meiss  is  automati¬ 
cally  satisfieo.  Furthermore,  the  components  of  the  extra  stress  tensor  E  may  be  written 
=  7(x,f),  E*^  =  E^^  =  <7(x,f),  and  E^^  =  r(x,f),  while  the  pressure  takes  the  form 
p  =  po{x,t)  —  f{t)y,  f  being  the  pressure  gradient  driving  the  flow.  In  these  terms,  Eqs.  3.3 
become 


7t  +  (1  -  a)avr  -  -A7  , 

(3.5a) 

crt  -  [f  (1  +  a)7  -  -1(1  -  a)^  +  /^]  =  -Ac 

,  (3.56) 

r(  -  (1  4-  a)avx  =  -Ar  . 

(3.5c) 

Introducing  the  variables  Z  ;=  ^(1^0)7  — t(1 —a)v  and  W  := 
Eqs.  3.5  simplify  to 

-A(l-(-a)7- A{l-a)r, 

Ce  -  (Z  -1-  p)vz  =  -Act  , 

(3.6a) 

Z(  -r  (1  —  a")crux  =  — AZ  , 

(3.66) 

Wt  =  -AW  . 

(3.6c) 

Because  IT  must  remain  finite  as  f  -+  -oo,  W  =  0.  and  the  last  equation  may  be  omitted. 
As  a  result.  Z  =  —  A(1  —  a^)(r  —  7),  where  E^*'  —  E^-^  =  r  —  7  is  the  principal  normal  stress 
difference. 

Combining  the  constitutive  law  3.6  with  the  balance  of  linear  momentum  3.1,  we  are 
led  to  the  system  of  equations 

pvt  -  CTj.  =  rjVxx  +  /  < 
cr<  -  (Z  -f  p)vt  =  -Act  , 

Z(  -+-  (1  —  a^)(TVi  =  —XZ  . 
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(3.7a) 

(3.76) 

(3.7c) 


Fig.  1:  Shear  flow  through  a  slit-die. 


In  this  paper,  we  study  shear  flow  between  two  parallel  plates,  located  at  x  =  ±/i/2.  By 
symmetry,  we  need  only  consider  the  flow  on  the  interval  [— ^/2,0].  The  no-slip  condition 
at  the  plate  implies  the  boundary  condition  v{—h/2,t)  =  0,  while  symmetry  imposes  that 
t;^(0,  <)  =  0.  We  also  prescribe  initizd  values  for  v,  cr,  and  Z,  which  must  be  compatible 
with  the  boundary  conditions.  To  conform  with  the  symmetry,  we  require  that  <7(0, 0)  =  0; 
then,  according  to  Eq.  3.7b.  cr(0,t)  =  0  for  all  time. 

To  ehminate  imnecessary  parameters,  we  scale  distance  by  h,  time  by  A“^,  and  stresses 
cr  and  Z  by  /j..  Furthermore,  if  we  replace  cr,  v,  and  /  by  a  :=  (1— a^)^^"cr,  v  (1  — 
and  /  :=  (1  —  a'Y^- f  respectively,  then  the  parameter  a  disappears  from  Eqs.  3.7.  Since 
no  confusion  will  arise,  we  omit  the  caret.  The  dimensionless  parameters  are  a  ph~X~ / 

and  e  :=  rjX/p.  Consequently,  we  study  the  initial-boundary- value  problem  for  the  system 


avt  -  cTj:  =  evii  +  f  , 

at  -  {Z  +  1)l’x  =  -cr  , 

Zt  -I-  crUx  =  -Z  , 

on  the  interveJ  [—1/2,0],  with  boundary  conditions 


(JS) 


{BC) 


u(-l/2,f)  =  0  and  Ux(0,t)  =  0 


and  initial  conditions 


t’(x.O)  =  uo(j)  ,  ct(x,0)  =  <To(ar)  ,  and  Z(x, 0)  =  Zo(x)  ,  {IC) 


where  yo(  — 1/2)  =  0,  i’o(O)  =  0  and  ao(0)  =  0. 

The  steady-state  solutions  of  (JS),  \.'Len  the  forcing  term  /  is  a  constant  /,  play  an 
important  role  in  our  discussion.  Such  a  solution,  denoted  by  F,  a,  and  Z,  is  given  as 
follows.  The  stress  components  "a  and  Z  are  related  to  the  velocity  gradient  (w'hich,  in 
dimensionless  imits,  is  the  Deborah  number)  through 


a  = 


l  +  vl 


and 


Z  +  1  = 


1 


1+T; 

Therefore,  the  total  steady  shear  stress,  which  is  defined  by  T  := 


T{v.)  = 


Vi 


1+F; 


—  +  ew: 


(3.8) 

(3.9) 

<7  -I-  et'x,  takes  the  form 
(3.10) 


In  this  manner,  a  non-monotone  relation  between  shear  stress  and  strain  rate,  shown  in 
Fig.  2,  derives  naturally  in  the  Johnson-Segalman  model. 

The  total  steady  shear  stress  satisfies 


T{vi)  +  fx  =  0  for  xe[-|,0]. 


(3.11) 


so  that  the  velocity  gradient  may  be  expressed  in  terms  of  x.  However,  because  the  function 
T  of  Eq.  3.10  is  not  monotone,  Vc  may  take  up  to  three  distinct  values  for  any  given  x. 
The  steady  velocity  profile,  shown  in  Fig.  3,  is  obtained  by  integrating  Vx  and  using  the 
boundary  condition  F(  — 1/2)  =  0.  Notice  that  Vx  may  suffer  jump  discontinuities,  resulting 
in  kinks  in  the  velocity  profile  (as  at  the  point  z,  in  Fig.  3). 

Traditionally,  a  non-monotone  relation  between  stress  and  strain  rate  is  regarded  as 
a  defect  of  the  constitutive  law.  This  conclusion  is  based  on  intuition  appropriate  for 
generalized  Newtonian  models  of  non-Newtonian  fluids.  Shear  flow  for  such  a  fluid  is 
governed  by  the  single  equation 


pvt  -  [v{vz)vx]x  =  f  ,  (3.12) 

corresponding  to  having  a  viscosity  coefficient  rj  that  depends  on  strain-rate.  In  a  flow 
regime  where  r)(vx)vz  decreases  with  strain  rate  u^,  however,  Eq.  3.12  has  the  character 
of  a  backward  heat  equation,  which  suffers  from  the  Hadamard  instability.  Therefore,  for 
generalized  Newtonian  fluids,  t]{vx)vx  must  increase  with  Vx  in  a  physically  stable  steady 
solution. 

The  system  (JS)  has  the  same  steady  solutions  as  a  generalized  Newtonian  fluid  with 
rj{vx)vx  =  T{vx).  so  one  might  think  that  it  exhibits  the  same  instability  in  regions  where 
T  decreases.  This  conclusion  is  not  warranted,  however,  because  the  system  (JS)  maintains 
its  evolutionary  character  when  e  >  0. 


4.  Mathematical  Results 

Several  mathematical  results  are  known  for  the  system  (JS);  we  refer  to  Refs.  [6.  17. 
3.  1C,  14.  4]  for  further  discussions  and  additional  references. 
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Fig.  2:  Totai  steady  shear  stress  T  vs.  shear  strain  rate  Vx  for 
steady  flow. 

(1)  When  the  viscosity  parameter  e  =  0,  the  quasi-linear  system  (JS)  is  strictly  hyperbolic 

provided  that  Z  +  1  >  0.  In  this  ceise,  the  wave  speeds  are  ±  [(Z  +  and  zero.  If, 

on  the  other  hand,  Z  +  1  becomes  negative,  then  (JS)  with  e  =  0  tmdergoes  a  change  of 
type  and  loses  its  evolutionary  character.  Joseph,  Renardy,  and  Saut  [6]  have  associated 
this  change  of  type  with  certain  fluid  instabilities. 

(2)  Let  e  =  0  and  /  =  0;  assume  that  the  initial  data  are  smooth  and  lie  in  the  hyperbolic 
region.  If  the  data  have  sufflciently  small  variation,  then  a  unique  classical  solution  of 
(JS)  e.xists  globally  in  time.  Moreover,  the  solution  decays  to  zero  as  t  cc.  This  can  be 
proved  using  the  energy  methods  discussed  in  Ref.  [17]. 

On  the  other  hand,  if  the  data  have  sufficiently  large  variation,  then  the  classical 
solution  blows  up  within  finite  time:  |ux|,  [ctx],  and  \zx\  approach  infinity  as  t  approaches 
a  finite  critical  time.  This  is  proved  in  Ref.  [17]  using  the  method  of  characteristics. 

Thus,  the  fading  memory  acts  as  a  weaJk  dissipative  mechanism:  the  source  terms  in 
the  equations  serve  to  counteract  the  formation  of  singularities  from  sufficiently  smooth 
data.  When  discontinuities  do  form,  system  (JS)  is  no  longer  valid:  the  products  of 
distributions  Zvx  and  crvi  are  ill-defined.  (See  the  discussion  under  (4),  below.) 
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Fig.  3:  Velocity  profile  for  steady  flow. 

(3)  If  e  >  0,  the  system  (JS)  is  evolutionary,  but  it  cannot  be  classified  according  to 
type.  Recently,  Guillope  axid  Saut  [3]  established  the  global  existence  of  solutions  of  (JS) 
for  planar  Couette  and  Poiseuille  flow  with  data  of  arbitrary  size.  They  also  studied  the 
asymptotic  (Lyapunov)  stability  of  steady  states  in  the  Couette  case. 

(4)  It  is  important  to  observe  that  (JS)  is  not  in  conservation  form.  The  evolution 
of  a  Johnson- Segalman  fluid  is,  in  fact,  governed  by  physical  conserv’ation  laws  [7].  A 
conservative  formulation  of  (JS)  must  be  used  when  solutions  are  discontinuous. 

Following  Plohr  [16],  we  introduce  the  “elastic  part'’  r  of  the  shear  strain  and  the 
“entropy"’  variable  r  through  the  relations 

<T  :=  z  sin  r  , 

Z  +  I  :=  z  cos  T  . 

Then  system  (JS)  is  transformed  into  the  equivalent  system 

Tf  —  Vx  =  —z~^  sinr  , 

avt  ~  [cr(T,z)  +  €Vx]x  =  f  ,  (C) 

Zt  =  -(z  -  cost)  , 


(4.1a) 

(4.16) 
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which  is  in  conservative  (i.e.,  divergence)  form.  Furthermore,  if  the  internal  energy  £  is 
defined  by 

ol£  •.=  1  —  z  cos  r  ,  (4.2) 

the  energy  is  dissipated  according  to  the  equation 

a[^v^  +  S(T,z)]^-{[<7(r,z)  +  €v^]v}^=vf-a£(T,z)-€ivj,f  .  (4.3) 

The  conservative  formulation  (C)  of  (JS)  is  used  in  one  of  the  numerical  methods  discussed 
in  §5. 

(5)  More  detailed  analytical  results  are  obtained  by  simplifying  the  system  (JS).  A  model 
system  that  incorporates  several  qualitative  features  of  (JS)  is  obtained  by  freezing  Z  at 
its  equilibrium  value:  Z  +  1  =  1/(1  +  v^).  Defining  g(vx)  :=  Vx/(1  +  vl),  system  (JS) 
becomes 


avt  -  (Jx  =  evxx  +  f  , 

<yt  -  g(vr)  =  -<7  .  (M) 

More  generally,  g  may  be  any  smooth,  odd  function.  The  boundary  ajid  initial  conditions 
for  V  and  <T'are  the  same  as  in  (BC)  and  (IC).  We  assume  that  e  >  0  and  that  /  is  the 
constant  /.  The  function  g  is  related  to  the  steady  stress-strain-rate  relation  through 
T(vx)  —  g(vx)  +  £vx.  A  steady  solution  of  (M)  satisfies  a  =  g(vx)  and  T(vx)  +  /x  =  0, 
just  as  for  the  system  (JS). 

Nohel,  Pego,  and  Tzavarcis  [14]  have  shown  that  the  global  cleissical  solution  u,  cr  of 
(M),  (BC),  (IC)  has  the  following  properties. 

(a)  With  S  :=  (7  +  ^Vx  +  /x,  5(x,t)  — »  0  as  <  — >•  oo,  uniformly  for  x  6  [—1/2,0]. 

(b)  There  exists  a  steady  state  u,  a  such  that  for  each  x  €  [—1/2,0],  u(x,t)  — >  i;(x), 
Vx{x,t)  —*  Ui(x),  and  cr(x,t)  — >•  o(x)  as  t  — >■  oo.  We  emphasize  that  the  steady 
velocity  gradient  Vx  and  stress  a  may  be  discontinuous  (as  in  Fig.  2). 

(c)  Let  V,  a  he  a,  steady  state  such  that 

r  (vx)  =  g'{vx)  -b  €  >  const.  >  0  .  (4.4) 

(Referring  to  Fig.  2,  inequality  4.4  precludes  top  and  bottom  jumping  and  excludes  the 
region  where  T(vx)  decreases.)  Consider  a  union  U  of  small  subintervals  of  —  |  <  x  <  0 
that  axe  centered  at  points  where  Vx  and  W  are  discontinuous.  Let  smooth  initied  data 
be  chosen  such  that  15(x,  0)|  is  sufficiently  small  except  in  U.  Then  the  solution  of  (M) 
converges  to  the  steady  state  u,  o  on  the  complement  of  U.  Moreover,  the  measure 
of  U  czin  be  made  arbitrarily  small  by  choosing  |S(x,0)|  small  enough.  In  this  sense, 
steady  states  are  stable  (even  if  and  a  are  discontinuous). 

The  numerical  results  discussed  in  §5  suggest  that  similar  results  hold  for  the  system  (JS). 
Proofs  for  (JS)  are  under  investigation. 
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(6)  The  model  problem  (M)  was  studied  also  by  Hunter  and  Slemrod  [4].  In  their  con¬ 
struction  of  the  model,  the  steady-state  relation  F  =  g(t7z)  between  the  stress  and  strain 
rate  is  chosen  to  be  gHs{vx)  ■=  crHs(vx)  —  where  the  graph  of  the  function  cths  re¬ 
sembles  Fig.  2  (but  is  independent  of  e).  Himter  and  Slemrod  base  their  analysis  on  the 
conservation  laws 


wt  —  Ur  =  0  ,  4.5a 

Q  Ut  -  <THs(«^)r  =  -  ttU  4.56 

for  the  acceleration  u  =  vt  and  the  strain  rate  w  =  Vr-  Therefore,  jumps  in  the  strain 
rate  Vx  are  seen  to  correspond  to  steady  shock  waves  for  the  system  4.5  with  e  =  0.  Based 
on  a  local  dynamical  analysis  of  shock  structure  for  small  e,  the  centerline  velocity  is 
shown  to  exhibit  hysteresis  under  quasi-static  cycling  of  the  pressure  gradient.  (This  same 
behavior  is  observed  in  the  numerical  simulation  of  the  system  ( JS);  see  §5.)  We  emphasize, 
however,  that  this  analysis  cannot  be  applied  to  the  model  problem  (M)  as  derived  from 
the  Johnson-Segalman.  system  (JS)  because  the  function  gjs{vx)  =  Ui/(1  4-  v^)  decays  to 
zero  at  high  strain-rate. 

5.  Numerical  Results 

To  study  the  dynamics  of  system  ( JS),  we  developed  several  different  numerical  meth¬ 
ods;  each  has  its  advantages  for  certain  ranges  of  physical  parameters.  Calculations  with 
these  methods  produce  similar  qualitative  and  quantitative  results. 

(1)  Solid  Mechanics  Formulation:  In  this  approach,  the  system  (JS)  is  regarded  as  gov¬ 
erning  the  extensional  motion  of  an  elastic-plastic  bar.  The  first  equation  is  momentum 
balance,  in  which  the  parabolic  term  adds  viscous  “stiffness  damping.”  The  remaining 
equations  are  incremental  constitutive  relations  for  the  stress.  The  stiffness  of  the  material 
is  reflected  in  the  wave  speed  [(Z  -|-  l)/a]^^^.  We  have  observed  that  the  wave  speed  is  di¬ 
minished  imder  loading,  so  that  the  material  exhibits  plastic  softening.  (See  also  Ref.  [16] 
for  an  interpretation  of  (JS)  eis  governing  a  viscoplastic  material.) 

We  have  solved  the  system  (JS)  numerically  using  a  method  motivated  by  solid  me¬ 
chanics.  The  momentum  equation  is  cast  in  Galerkin  weak  form,  with  the  velocity  approx¬ 
imated  as  piecewise  linear  and  the  stress  components  as  piecewise  constant.  With  the  time 
derivative  discretized  using  a  trapezoidal  approximation,  and  the  shear  stress  determined 
through  a  semi-implicit  treatment  of  its  evolution  equation,  the  Galerkin  equation  is  solved 
for  the  velocity.  Then  the  stress  components  are  updated  using  an  implicit  form  of  the 
constitutive  equations;  further  details  can  be  found  in  Ref.  [8].  The  stability  of  this  method 
has  been  analyzed  for  the  system  (JS)  with  Z  frozen  [11]:  the  method  is  stable  provided 
that  Z  +  1  >  — e  and  the  time  step  is  restricted  by  At  <  2/A  in  Eq.  3.7  i.e..  At  <  2  in  (JS). 
For  Z  -I-  1  <  — €,  the  linearized  equations  and  the  method  are  unstable. 

(2)  Parabolic  Formulation:  Recall  that  the  total  stress  is  defined  to  be  T  =  cr  -f  evr- 
Introducing  T  as  an  independent  variable  for  e  >  0,  the  system  (JS)  is  replaced  by; 

Tt  =  ^Trx  +  (^  +  1) 
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The  boundary  conditions  axe  Tj;(  — 1/2,  <)  =  — /  and  T(0,t)  =  0.  The  velocity  profile  may 
be  reconstructed  by  integrating  (T  —  cr)/e. 

The  system  5.1  has  the  form  of  a  linear  heat  equation  forced  by  a  nonlinear  heat 
source  that  is  governed  by  two  auxiliary  ordinary  differential  equations.  To  solve  this 
system  numerically,  we  discretize  the  parabolic  term  in  5.1a  implicitly  while  treating  the 
remaining  forcing  terms  explicitly.  Time  integration  is  performed  using  a  stiff  ODE  solver. 

We  remark  that  system  5.1  is  convenient  also  for  studying  existence  and  regularity  of 
solutions  of  (JS). 

(3)  Conservative  Formulation:  The  system  (JS)  is  equivalent  to  the  system  (C);  therefore, 
it  may  be  studied  from  the  viewpoint  of  conservation  laws.  In  Ref.  [16],  we  have  determined 
completely  the  structure  of  scale-invariant  nonlinear  waves  for  (C)  when  e  =  0.  Such  a  wave 
consists  of  a  sequence  of  elementary  scale-invariant  waves,  either  centered  discontinuities 
or  rarefaction  waves,  connecting  constant  states  on  the  left  and  right.  Discontinuities  are 
required  to  satisfy  Liu’s  generalization  of  Oleinik’s  entropy  condition,  which  guarantees 
that  energy  is  dissipated  (cf.  Eq.  4.3).  This  admissibility  condition  is  equivalent  to  requiring 
shock  waves  to  have  viscous  profiles:  admissible  shock  waves  arise  as  limits  of  traveling- 
wave  solutions  of  (C)  as  €  —*  0.  Our  analysis  follows  the  techniques  for  general  systems  of 
conser^'ation  laws  discussed  in  Refs.  [13]  and  [5]. 

With  the  structure  of  scale-invariant  waves  known,  Riemann  initial-value  problems 
may  be  solved.  We  have  written  a  computer  program  that  solves  Riemann  problems, 
and  have  incorporated  it  into  the  Glimm-Chorin  random  choice  method.  This  method 
solves  the  Cauchy  problem  without  introducing  artificial  Newtonian  viscosity.  We  refer  to 
Ref.  [15]  for  a  detailed  discussion. 

As  our  first  numerical  experiment,  we  simulated  system  (C)  with  e  =  0  using  the 
random  choice  method.  The  channel  width  was  chosen  so  that  a  =  1.  The  flow  was  initially 
in  the  classical  steady  state  corresponding  to  the  critical  pressure  gradient  fcj-u  =  I;  then 
the  pressure  gradient  is  increased  abruptly  to  the  super-critical  value  1.2/crif 

The  result  is  shown  in  Fig.  4.  The  fluid  velocity  v  is  plotted  vs.  position  x  at  successive 
time  intervals;  generally  the  velocity  increases  with  time.  During  the  early  stages  of  the 
experiment,  the  flow  settled  into  a  quasi-steady  state.  This  latency  effect  is  especially 
evident  in  a  plot  of  the  centerline  velocity  as  a  function  of  time,  and  it  is  more  pronounced 
when  the  channel  width  (i.e.,  h  —  hence  also  a)  is  smaller.  Eventually,  however,  a  thin  layer 
develops  at  the  plate  in  which  the  velocity  rises  to  a  value  that  is  nearly  constant  across  the 
channel.  For  practical  purposes,  the  fluid  has  broken  free  from  the  plate  and  is  accelerating 
uniforml;/  under  the  applied  pressure  gradient;  thus  the  fluid  “slips.”  We  do  not  claim  to 
have  developed  a  new  theory  of  wall  slip  at  this  point,  though  this  phenomenon  has  been 
associated  with  non-monotone  constitutive  relations  by  others  [2,  9].  If  this  connection 
is  to  be  explored  more  deeply  in  the  future,  it  is  worth  noting  the  success  of  the  random 
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Fig.  4;  Onset  of  slip  for  a  fluid  without  Newtoniaji  viscosity. 

choice  method  in  the  post-criticaJ,  e  =  0  regime;  it  is  the  only  one  of  our  methods  that  can 
compute  in  this  range. 

The  scune  experiment  was  performed  for  system  (C)  with  a  small,  but  nonzero,  New¬ 
tonian  viscosity  coefficient  e.  Fig.  5  shows  the  results  for  e  =  0.01,  as  calculated  using  the 
Lax-Wendroff  method  with  Tyler  axtiflcial  viscosity.  What  results  is  evidently  a  different 
phenomenon,  in  which  the  shorter  relaxation  response  (here  modelled  by  Newtonian  vis¬ 
cosity)  of  the  fluid  arrests  the  acceleration  in  a  layer  near  the  wall.  Now  the  layer  is  much 
thicker,  with  its  outer  boundary  corresponding  to  a  discontinuity  in  the  strain  rate 
The  solution  approaches  a  steady  state  in  which  Vx  is  discontinuous  but  the  total  stress 
r  =  cr{vx)  +  eux  is  continuous.  The  steady  state  has  the  same  layer  thickness  as  predicted 
analytically,  but  the  centerline  velocity  is  20%  too  high;  this  is  because  the  centerline  ve¬ 
locity  is  extremely  sensitive  to  the  slope  of  the  velocity  profile  in  the  layer,  which  is  affected 
by  the  artificial  viscosity  in  the  numerical  method,  the  layer  formation  is  the  key  to  our 
interpretation  of  the  spxirt  phenomenon. 

More  extensive  experiments  were  performed  using  the  solid  mechanics  algorithm.  For 
example,  the  calculation  of  Fig.  5  was  repeated  using  this  method  and  a  graded  mesh  of  160 
elements;  the  same  layer  thickness  as  shown  in  Fig.  5  weis  obtained,  and  the  centerline  ve¬ 
locity  of  the  long-time  solution  differed  from  the  analytic  prediction  by  about  only  1%.  We 
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Fig.  5;  Onset  of  spurt  for  a  fluid  vvith  Newtonian  viscosity. 

time  to  model  the  effect  of  a  sprnni-J  roia  f  ^  zero  relaxatic 

this  is  done  ^  th  y  f  ■  relaxation  time  very  much  shorter  than  Wh< 

in  svstfm  f  jsMlr,  f  ''■f  «i''en  in  §3,  We  emphasize  that,  although  the  Jer 
*1  j  appears  formally  as  a  Newtoniem  or  solvent  viscositv  111  t>i(=rp  •  i 

mvolved  in  Vinogradov's  ntaterials.  The  samples  are  libelled  m  iwlh  PlT'I  " 
bv  increasing  molecular  weight,  M.  The  following  features  of  Vinogradov’s  dIi^’ 
samples  were  used  to  determine  the  physical  constants:  ^  polyisoprei 

1.  The  elastic  modulus,  p,  is  independent  of  the  molecular  weight,  U. 

-  The  contnbution  to  the  zero  shear  viscosity  from  the  dominant  relaxation 

A-,Xir  variatiofrrrSla: 

-Wch  the  material  wil 

4.  For  samples  PI-3-8,  the  observed  critical  stress  is  not  a  function  of  M. 
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Fig.  6:  Centerline  velocity  vs.  time. 

These  observations  and  the  presumption  that  the  secondary  relaxation  time  and  its  associ¬ 
ated  viscosity  are  independent  of  M,  lead  to  a  set  of  values  of  a  and  e  that  decreases  with 
M.  These  values  are  readily  obtainable  from  our  definitions  in  §3  and  the  dimensional 
information  given  in  Ref.  [S],  where  further  details  on  parameter  estimation  may  be  found. 
The  results  are  shown  in  Figs.  6-8. 

Fig.  6  shows  the  evolution  of  the  spurt  process  in  time;  centerline  velocity  is  plotted  vs. 
time  for  values  of  a  and  e  of  sample  PI-7  with  /  =  1.2.  The  simulations  were  carried  out 
using  zero  initial  data.  The  spatial  discretization  wa.s  a  graded  mesh  with  smaller  elements 
near  the  wall,  consisting  of  640  elements.  The  maximum  velocity  in  Fig.  6  is  scaled  by  a 
Newtonian  viscous  response  with  viscosity  e  that  happens  on  such  a  short  time-scale  as  not 
to  be  distinguishable  from  the  f  =  0  axis.  The  period  of  time  from  start-up  to  the  onset 
of  spurt  at  f  =  2.17  in  nondimensional  units  is  the  latency  period  in  which  a  quasi-steady 
flow  exists.  Rescaling  with  appropriate  dimensions  gives  the  prediction  of  a  latency  time 
of  346  sec.  for  sample  PI-7.  The  spurt  process  in  Fig.  6  has  not  been  carried  out  for  a  long 
enough  time  to  achieve  a  very  nearly  steady  state;  numerical  simulations  which  run  for 
about  5  more  nondimensional  time  imits  would  be  required.  Thus  we  predict  that  the  whole 
dynamic  process  takes  on  the  order  of  forty  minutes  to  unfold  for  this  sample.  We  have  run 
a  sequence  of  such  simulations  for  each  of  the  eight  samples,  allowing  a  sufficiently  long 
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Fig.  7:  Volumetric  flow  rate  vs.  effective  sheeir  stress:  (a)  experi¬ 
ment  [IS];  (b)  numerical  calculation  [8].  Note  that  the  horizontal 
scale  of  this  panel  matches  that  of  panel  (a),  but  the  vertical  scale 
does  not. 


time  to  obtain  essentially  steady  solutions.  Fig.  7  shows  the  results  of  these  simulations 
compared  to  the  data  reported  in  Ref.  [IS];  volumetric  flow  rate,  normalized  in  such  a  way 
that  it  hcLS  units  comparable  to  shear  rate  [8],  is  plotted  vs.  T  at  the  die  or  capillary  wall. 
The  value  of  T  can  be  deduced  by  knowing  the  pressure  drop  and  using  the  relation  of 
Eq.  3.11. 

Fig.  8  shows  the  result  of  simulating  a  loading  sequence  in  which  the  pressure  gradient 
/  is  increased  in  small  steps,  allowing  sufficient  time  between  steps  to  achieve  steady 
flow  [8].  The  loading  sequence  is  followed  by  a  similar  unloading  sequence,  in  which  the 
driving  gradient  is  decreased  in  steps.  The  initial  step  used  zero  initial  data,  and  succeeding 
steps  used  the  results  of  the  previous  steps  as  initial  data.  The  resulting  hysteresis  loop 
exhibits  features  similar  to  those  observed  by  Hunter  and  Slemrod  in  their  model  [4]  (see 
system  4.5  above),  which  they  called  “shape  memory.”  The  width  of  the  hysteresis  loop 
at  the  bottom  can  be  related  directly  to  the  molecular  weight  of  the  sample  [8]. 

We  have  performed  careful  numerical  experiments  to  test  the  validity  of  the  results 
we  report  here.  One  of  the  questions  we  sought  to  resolve  involves  the  oscillations  evident 
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Fig.  8:  Hysteresis  under  cyclic  loading. 


in  Fig.  6  during  the  spurt  process.  In  Ref.  [8],  results  were  reported  on  meshes  much 
cruder  than  the  one  used  to  compute  the  results  of  Fig.  6;  the  oscillations  were  larger 
in  eunplitude  which  did  not  diminish  with  refinement  of  time  step.  Fig.  6  shows  that 
these  oscillations  diminish  with  refinement  of  the  grid  size,  and  we  are  led  to  conclude 
that  the  oscillations  reported  in  Ref.  [8]  are  induced  by  spatial  discretization  error.  This 
conclusion  is  reinforced  by  inspection  of  Fig.  6;  the  larger  oscillations  occur  at  later  times, 
when  the  layer  boundary  moves  toward  the  interior  of  the  die  where  the  elements  of  the 
graded  mesh  are  larger.  Eventually,  these  larger  oscillations  are  damped,  as  they  are  using 
meshes  consisting  entirely  of  larger  elements.  Our  mesh  refinement  studies  lead  us  to 
infer  that  crude  spatial  resolution  can  lead  to  spurious  oscillations  in  spurt  dynamics  that 
oscillate  about  the  correct  mean  value  and  lead  to  accurately  represented  steady  states. 
These  conclusions  have  been  confirmed  by  reproducing  the  results  just  described  using  the 
parabolic  formulation  (system  5.1).  The  mesh  was  refined  to  3072  equal-sized  cells;  it  was 
found  that  there  is  a  weak  stability  condition  relating  time  step  to  cell  size.  If  this  condition 
is  violated,  the  spurt  appears  to  occur  prematurely  with  the  parabolic  method  on  fine  grids; 
however,  when  the  time  step  is  refined  on  the  finest  grid,  the  results  obtained  with  the 
solid  mechanics  method  and  parabolic  method  agree  to  at  least  graphical  accuracy,  and 
both  give  virtually  the  same  estimate  of  latency  time. 
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In  the  region  of  parameter  space  characteristic  of  Vinogradov’s  data,  much  can  be 
deduced  about  the  features  of  system  (JS)  without  recourse  to  computed  results.  The  de¬ 
ductions  which  follow  were,  however,  guided  by  detailed  study  of  the  results  of  numerical 
simulation.  First,  since  a  is  so  small  (of  order  10“^^),  Eq.  3.11  holds  virtually  instanta¬ 
neously  and  for  all  time.  Thus  in  system  5.1,  the  first  equation  may  be  eliminated,  and  T 
becomes  a  parameter  whose  value  at  any  point  in  the  die  is  given  by  Eq.(3.11).  The  result¬ 
ing  system  of  two  ODEs  can  be  analyzed  completely  by  a  phase-plane  analysis  [10]  that 
shows  a  single  attractor  for  0  <  T  <  -j,  giving  pre-critical  solutions  not  involving  spurt. 
If  I  <  T  <  1  and  e  <  |,  there  are  two  stable  attractors:  one  not  involving  spurt  and  one 
in  which  spurt  has  taken  place.  Latency  can  be  interpreted  as  the  time  during  which  the 
system  stays  neax  the  first  attractor,  which  we  call  the  “latent  attractor.”  When  T  >  1  and 
e  <  I  there  is  no  latent  attractor;  this  result  can  be  confirmed  numerically.  Furthermore, 
for  all  fluids  PI-3-8  that  exhibit  spurt,  e  •C  1-  An  asymptotic  expansion  of  the  solution  of 
the  two  ODEs  in  system  5.1  for  small  e  gives  quantitative  estimates  of  dynamic  behavior 
during  latency  and  of  the  resulting  asymptotic  steady  states  [10] .  This  asymptotic  analysis 
can  also  predict  flow  rates  in  steady  states  (and  thus  reproduce  Fig.  7(b))  and  predict  the 
shape  memory  in  hysteresis.  It  is  easy  to  see  that  Z  satisfies  the  ODE 


Z-  +  Z  +  T^ 


Z  +  1 


(5.2) 


at  zero  order  in  e  (i.e.,  to  0(e)  acciiracy)  near  the  latent  attractor.  The  latency  time  is  the 
time  during  which  a  retains. a  relatively  constant  value  of  approximately  T  and  grows  only 
at  first  order  in  e,  while  Z  grows  at  zero  order,  goverened  by  Eq.  5.2.  An  initial  value  of 
Z  =  0  at  t  =  0  for  Eq.  5.2  is  not  appropriate  because  the  asymptotic  expansion  leading  Lo 
that  equation  is  only  valid  near  the  latent  attractor;  however,  an  early-time  expansion  can 
be  developed  that  is  valid  during  the  initial  Newtonian  response  alluded  to  in  connection 
with  Fig.  6  [10].  On  this  time  scale  (which  is  of  order  e)  <7t  and  Zt  are  0(e~^)  while  a  and 
Z  are  0(1).  This  leads  to 


Z  ~  (1  —  T^)2  —  1  at  f  w  0  (5.3) 

When  Eq.  5.3  is  used  as  an  initial  condition  for  Eq.  5.2  at  t  =  0,  the  latency  time  is 
estimated  as  the  time  at  which  Z  =  —1.  This  may  be  calculated  directly  by  solving 
Eq.  5.2  for  t  =  f(Z),  see  [10].  The  result  for  the  case  of  Fig.  6,  where  |T|  =  f/2  =  0.6.  is 
a  prediction  of  a  latency  time  of  366  sec.,  which  compares  very  favorably  to  the  value  of 
346  sec.  obtained  from  the  full  simulation. 

In  Ref.  [8],  several  possible  experiments  are  suggested  that  could  verify  the  interpre¬ 
tation  of  spurt  put  forward  here;  the  key  experiment  suggested  is  the  verification  of  the 
molecular-weight  dependence  of  the  widest  point  of  the  hysteresis  loop  of  Fig.  8.  We  re¬ 
mark  that  the  shape  of  this  loop  is  a  key  feature  of  “shape  memory”  in  that  the  loop 
always  opens  from  the  point  at  which  unloading  begins.  This  occurs  as  the  solutions  pro¬ 
ceed  from  “top-jumping”  in  Fig.  2  through  intermediate  convexifications  of  the  curve  to 
“bottom-jumping”  at  the  point  where  a  discontinuity  in  slope  can  be  seen  in  the  back  part 
of  the  loop  in  Fig.  8;  this  is  in  distinct  contrast  to  the  interpretation  of  Ref.  [12],  where 


149 


bottom-jumping  is  always  the  rule  for  steady  spurt  solutions,  and  portions  of  the  hysteresis 
loop  are  retraced  during  unloading.  To  these  experimentally  verifiable  signatures  of  our 
model,  the  current  analysis  allows  us  to  add  more:  When  a  and  e  are  sufficiently  small, 
cLS  they  are  with  PI-3-8,  latency  time  should  be  rather  easy  to  measure,  since  a  very  slow 
flow  with  comparatively  little  throughput  can  persist  for  many  minutes  before  dramatic 
growth  occurs.  We  predict  that  latency  can  only  occur  for  samples  with  e  sufficiently  small 
(|  or  less  for  J-S,  but  the  precise  number  may  be  model-sensitive).  For  J-S,  it  can  only 
occur  for  stresses  in  the  range  -^  <  T  <  1.  It  should  scale  with  A“^  at  fixed  T  and  obey 
Eqs.  (5.2)  and  (5.3)  approximately. 

6.  Conclusions 

Well-posed  dynamical  problems  based  on  non-monotone  constitutive  relations  need 
not  be  unphysical.  In  fact,  our  Johnson-Segalman  model  provides  a  relatively  simple 
example  which  accurately  describes  spurt.  Other  models,  based  on  more  sophisticated 
molecular  theory,  appear  to  have  similar  features  [8]  which  require  further  investigation. 
In  addition  to  reproducing  spurt,  our  approach  leads  to  results  which  suggest  new  experi¬ 
ments.  as  discussed  in  §5.  Our  numericcd  approaches  to  the  fully  time-dependent  problem 
for  a  J-S  fluid  at  a  high  Deborah  number  avoid  the  “high  Weissenberg/Deborah  number 
problem”  at  least  in  1-D.  We  are  currently  investigating  generalizations  of  our  approaches 
to  multi-D  problems  of  physical  interest. 
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ABSTRACT 


The  oae-diaeaelouel  dlffusloa  cquatloa,  ST 
algebraic  ezpreaaioa  (Parc  1)  as  at 


T(M,P)  -  — (A-fta  +  Qa^  +  ) 

2  2^ 


Is  cransposed  laco 


where  AtStC,  etc.  ace  whole  oumbers.  le  la  developed  chae  eocrelatloiui  in 
sequenced  A,B,C,  etc.  coeffeienea  are  Pascal  Trlan^  and/or  aciCheMCie 
square  ceras*  Rircheraore,  chese  ceraa  are  expressible  aa  PokhaoMC 
relaelons . 
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UTtlOUUCriON 


Parc  1  of  this  report  ladicaced  a  aovel  soluCioa  to  the  basic  diffasioa 
equatioa  of  Physics  where  the  field  houadary  extends  from  zero  to  positive 
infinity.  The  nodal  points  of  the  fieid  net  are  identified  as  Cerminating 
polynomials  with  the  numerators  of  Che  coefficients  found  first  by  deduction  - 
for  the  lower  orders  -  and  then  by  extrapolation.  Part  II  considers  the 
numerical  analysis  employed  Co  complete  Che  entire  sec  of  Cables. 

PROCEDURE 


A.  Direct  Differences 

Formulation  of  the  polynomial  form  of  the  discrete  solutions,  Eii .  (1),  of 
Che  diffusion  equation  from  cne  Schmidt  plot  geometry  is  described  in 
References  (1)  and  (2)  and  reflects  a  progressive  trigonometric  construction 
where  the  degree  and  term  extension  increases  with  time  and  decreases  with 
distance,  time  and  distance  referring  Co  the  unsteady  heat  fiow  application. 


T  (.'I,P) 


2^  2 


A 


A^m  +  A 


+ 


+  A^m’^  ]  (1) 


where  m  is  cne  independent  variable, 

I'j  is  a  distance  index, 

P  is  a  time  index, 

h  .  (P^^i'^)-2-  I  sin  (P+N)tt/2  I 

2 

j  =>  (k  -  term  exponent  of  m) ,  the  individual  term  denominator  exponent, 
and  <p  »  m  (Tq  -  Tj^O),  with 

A,  A^ ,  A2 . the  numerical  coefficients  of  die  Interior  terms  of 

Che  equation. 

The  numerators  of  each  term  of  the  "nodal"  equations,  are  uniquely  related  to 
adjacent  time  and  distance  term  coefficients  of  the  same  degrees.  This  relation, 
originally  found  accidentally,  is  correlated  by  tne  cable  of  differences  shown  in 
Table  1  d  1-A  where  the  boxed  vertical  sequence;  17548,  25147,  35401,  49024  and 
66868,  is  established  by  the  reduction  of  cne  ScnmidC  plot  through  Che 
trigonometric  analysis .  Step-wise  right  moving  subtraction  generates  a  column  of 
residual  zeroes,  an  adjoining  column  of  ones,  and  a  digital  sequence  identified  as 
"IV"  in  Table  1.  Reducing  this  column  "IV"  to  zero  vertically  then  allows  a 
corresponding  determination  of  tne  particular  values  of  the  entire  matrix  from 
inspection  of  the  biased  rows  I,  II  and  III. 


1 .  ;«/iliiam  F.  Donovan,  "Determination  of  Heat  Transfer  Cbefficient  in  a  Gun 
Barrel  from  Experimental  Data,"  Memorandum  Report  BRL-MR-3428,  January  1935 
(ADf/  A151315; 


2.  William  F.  Donovan,  "Poioynomial  Definition  of  Discrete  Field  Points  of 
Map  of  Diffusion  Equation,  Part  i,"  Memoranaum  Report  BRL-MK-J649 . 
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Table  1-a  (continuation  of  Table  1) 
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B.  Summing  Progression 


Rewricln^'Table  1  as  Table  2,  where  Che  biased  rows  are  horizontal  and  the 
zero  column  is  left- justified  instead  of  right-justif ied,  shows  that  the  sum  of 
any  two  adjacent  column  values  of  any  row  gives  the  value  of  the  next  row  entry 
directly  under  the  right-wise  addendum.  The  sequence  of  How  I,  regardless  of 
Che  degree  of  the  term  represented  by  the  matrix,  always  starts  with  zero  and 
then  maintains  alternate  zeroes  to  Infinity.  Each  of  the  difference  Cables 
corresponding  to  a  given  "m"  exponent  (Eq.  (1))  can  be  resolved  similarly, 
except  tnat  each  first  row  is  carried  in  a  unique  progression.  This  progression 
is  repeated  without  the  interspersed  zeroes  wicnin  tne  matrix. 

C.  Pascal  Triangles 

The  classical  Pascal  triangle.  Table  3,  can  be  formed  by  simple 
addition  wnere  eacn  term  is  the  sum  of  Che  two  previous  superior 
terms  and  individual  entries  are  represented  by  tne  binomial  coefficient, 

2  2  ^ 

(  )  »  ,  where  z  and  w  represent  cue  row  and  column  of  a  particular 

coel^f icienc^ of '^a*  binomial  expansion.  A  modification  of  the  Pascal  triangle  is 
found  by  writing  the  diagonals  as  rows  whicn  tnen  generates  Che  "arithmetic 
square".  Table  4,  also  Icnown  historically.'^ 

It  is  precisely  cnese  progressions  alternating  wicn  zeroes,  which  comprise 
tne  first  rows  of  the  individual  "summing  progressions"  shown  as  Table  2.  In 
this  case  the  digital  enumeration  of  the  rows  indicates  the  degree  of  Che  "m" 
term. 

D.  Pockhammer's  Symool 

Each  row  of  Table  4  can  be  examined  by  finite  differencing  Co  establish, 
via  Gregory-Newton,  a  definitive  polynomial  expansion  extending  to  infinity. 
Rirthermore,  the  algebraic  equation  can  be  simplified  to  a  factorial  form  known 
as  Pockhammer's  Symbol  or  as  a  it  factorial.  Appendix  A  presents  an  example  of 
such  a  development  for  "m"  degrees  zero  through  three.  The  complete  arithmetic 
square,  Taole  2,  can  Chen  be  written  as 

where 

r  »  degree  of  m, 

V  »  column  value 

and  f(v)  =»  row  value  of  Table  2.  Prom  Table  2,  a  complete  construction  of 
Taole  1  follows. 


3 

N.fa,  Vilenkin,  Combinatorics ,  Academic  Press,  iVew  York  &  London, 
p71,  pp  90-94. 

Spiegel,  H.  R. ,  Theory  and  Problems  of  Finite  differences  and  Finite 
Difference  Equations,  Schaum's  Outline  Series  in  i-latnematics ,  McGraw-liili  Book 
Company,  Sew  York,  etc.  1971,  p.p.  36-44. 
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TeUale  3.  Pascal  Triangle 
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Table  4.  Arithmetic  Square 
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Summary 

A  straight-forward  method  (arithmetic  squares)  Is  described  to  permit  the 
numerical  construction  of  the  differencing  tables  of  Part  I  of  this  report. 

The  derivation  through  the  Pascal  triangle  and  correspondence  to  Pockhammer'a 
Symbol  notation  Is  demonstrated. 
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APPENDIX  A 


DeterminaClon  of  Pockhaouner's  Notation 

A  discussion  of  the  Gregory-Newton  analysis  is  presented  in  Reference  4.  It 
consists  of  determining  a  polynomial  expression  to  represent  a  progressive 
sequence  of  numbers.  In  the  present  application,  it  is  used  to  examine  the 
base  tow  development  of  Table  4. 


Given  a  unit  stepping  difference  in  a  counting  reference,  v,  and  a  matched 
sequence  f(y); 


f(y)= 


f(v) 


Af(v)  (I)  .  A^f(v) 
1!  ^  2! 

+  y  (3)+  . 

3!  y  ^ 


(2) 

y 


where 

V  is  the  step  level, 

f(y)  is  the  dependent  variable, 

A^'^fCv)  are  the  diagonal  values  of  the  difference  table,  and 

y(0)  .  ^ 


y(2)  ,  y(y-l) 

y(3)  ,  y(y_l)  (y_2)  •••• 

etc . 

For  the  case  of  "m”  degree  zero  where  f(y)  is  from  Table  4: 
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V 


f(y) 

1 


f(v) 


Af(v) 


0 

1  2 

2  3 


3  4 

4  5 


5  6 

6  7 


7  8 


1 

1 

1 

1 

1 

1 

1 


0 

0 

0 

0 

0 

0 

0 


00 

9 

0 

9 

10 

1 

0 

10 

11 

1 

0 

11 

12 

1 

0 

f(y) 


f(v)  + 


Af  (v) 

1! 


=»  1  +  V  +  0  . 

-  (V  +  1) 

For  m  degree  1 : 

V  f(y)  f(v) 


0  1 


1  3 

2  6 


3  10 


4  15 


2 

3 

4 

5 


Af  (v) 


A^f(v) 


fCy) 


163 


1  +  2v  +  "I  v(v-l)  +  0  + 


I  (v^  +-3v  +  2) 


I  (V  +  1)  (v  +  2) 


For  m  degree  2 

f(y) 


1 

4 

10 

20 

35 

56 


f(v) 

3 

6 

10 

15 

21 


Af(v)  A^f(v) 


3 

4 

5 

6 


1 

1 

1 


-  1  +  3v  +  |-  (v)(v-l)  +  j  (v)  (v-1)  (v-2)  +  0 

=«  I  +  3v  +  j  (v^  “  “  3v  +  2) 

-  4-  (v^  +  6v^  +  llv  +  6) 
o 

-  T  (v  +  1)  (v  +  2)  (v  +  3) 

0 


The  pattern  continues  so  that : 
Degree  of  "m" 

-  r  f(y) 


0 

1 

2 

3 

4 


(v  +  1) 

(v  +  1)  (v  +  2) 

2 

_1_  (v  +  1)  (v  +  2)  (v  +  3) 

6 

(v  -*■  1)  (v  +  2)  (v  +  3)  (v  +  4) 

24 

1  (v  +  1)  (v  +  2)  (v  +  3)  (v  +  4)  (v  +  5) 
120 


A^f(v) 


0 

0 
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5 


1  (v  +  1)  (v  +  2)  (v  +3)  (v  +  4)  (v  +5)  (v  +  6) 
720 


and  Che  genef^  expression  is 
1 

f(y)  - -  (v) 

(r  +  1)  !  r  +  1 

where  (v)  =»  (v  +  1)  (v  +  2)  (v  +  3) . (v  +  r  +  1) 

r  +  1 

With  respect  to  the  original  time  index,  P,  of  Che  diffusion  equation  polynomial 
V  =  P-1  by  Table  3,  and 

(v)  =  (P  -  1  +  1)  (P  -  1  +  2)  .  .  .  .  (P  +  r) 

r  +  1 

=  P  (P  +  1)  (P  +  2)  (P  +  3)  .  .  .  .(P  +  r) 

so  that 

1 

f  (y)  - -  (p) 

(r  +  1)  !  r 

wnich  is  known  as  Pockhammer's  Symbol.* 


•ff 

Gravio  A.  Korn,  Mathematical  Handbook  for  Scientists  and  Engineers,  McGraw- 
Hill  Book  Co.,  Inc.,  New  York,  etc.,  1961. 
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List  of  Symbols 


h 

j 

k 

m 

r 

V 

f(v) 

w 

z 


Ai ,  A2 ,  Aj ,  .  .  .  . 


N 

P 

T 

<P 


exponent  of  2  in  external  denominator 
exponent  of  2  in  each  term  denominator 
exponent  of  "m"  In  final  term 
Independent  variable 
degree  of  "m" 

column  value  of  stepping  sequence 

row  value  of  stepping  sequence 

inferior  component  of  binomial  coefficient 

superior  component  of  binomial  coefficient 

numerical  coefficients 

distance  index 

time  index 

dependent  variable 

external  numerator 
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Annulus-based  Inclusion  Testing  for  Multiply-Connected  Sets 


Terence  M.  Cronin 

US  Army  CECOM  Center  for  Signals  Warfare 
Vint  Hill  Farms  Station 
Warrenton  VA  22186 


Abstract:  A  new  data  structure  is  introduced,  as  a  vehicle  to  test  for  metrical  inclusion  of  an  arbitrary 
point  within  a  potentially  multiply-connected  closed  curve.  From  graph  theory  and  topology,  we  know 
that  if  a  line  drawn  from  a  point  through  a  simply-connected  closed  contour  non-tangentially  intersects 
the  contour  an  even  number  of  times,  then  the  point  is  on  the  exterior;  otherwise  it  lies  within  the 
interior  (the  parity  algorithm).  This  theoretical  result  is  powerful,  but  fails  for  multiply-connected 
curves.  It  also  does  not  provide  information  about  either  the  distance  or  direction  from  a  point  to  a 
contour.  A  proposed  solution  incorporates  a  new  structure  called  the  inner  annulus,  which  is  computed 
with  a  corresponding  generator  function,  using  as  input  the  digital  representation  of  a  closed  Jordan 
curve.  The  annulus  can  be  viewed  as  the  set  of  points  which  are  4-connected  to  the  inside  edge  of  the 
contour.  It  is  algorithmically  generated  by  traversing  the  inside  edge  of  the  contour  in  a 
counterclockwise  fashion,  and  collecting  the  pixels  visited,  until  the  start  point  is  seen  again.  During 
this  process,  the  interior  of  the  contour  is  always  to  the  left.  Once  the  annulus  is  constructed,  a  test  is 
made  to  determine  if  an  arbitrary  coordinate  is  nearer  the  original  contour  or  its  inner  annulus,  which 
determines  respectively  whether  the  point  is  exterior  or  interior  to  the  contour.  The  same  technique 
may  be  applied  to  any  holes  contained  in  the  contour,  so  that  multiply-connected  sets  are 
accommodated.  The  computational  complexity  of  the  algorithm  is  analyzed  in  terms  of  time,  space,  and 
preprocessing  requirements.  A  conjecture  is  posed,  asserting  the  length  of  the  inner  annulus  in  terms  of 
the  length  and  shape  of  the  original  contour.  An  attempt  to  prove  the  conjecture  has  yielded  a  formal 
characterization  of  the  shape  of  a  contour,  in  terms  of  the  convexities  and  concavities  exhibited  by  the 
boundary  of  the  contour. 

Introduction:  High  lev?'  map  reasoning  is  a  problem  area  which  lacks  a  complete  mathematical 
formalization.  This  paper  describes  a  new  tool,  termed  annulus-based  inclusion  testing,  built  upon  a 
foundation  of  topology  and  geometry.  The  tool  addresses  only  one  smail  portion  of  the  spatial 
reasoning  problem:  that  of  metrical  interior/exterior  region  discrimination  for  multiply-connected 
contours.  Any  holes  contained  in  the  parent  contour  are  cross-referenced  by  a  graph-theoretic  structure 
called  a  feature  orientation  lattice.  Another  tool  under  development,  called  equidistance 
loci-reduction,  is  designed  to  rapidly  render  the  contour  nearest  an  arbitrary  map  coordinate,  as  well  as 
the  relative  direction  and  distance  to  the  coordinate  [Cl].  The  flow  of  logic  is  as  follows.  Loci-reduction 
serves  as  a  filter  to  rapidly  pinpoint  the  nearest  topological  feature  to  an  arbitrary  point.  If  the  feature 
is  a  closed  contour,  annulus-based  inclusion  testing  decides  whether  the  point  lies  within  the  boundary 
of  the  feature,  and  if  so,  where  inside.  The  feature  orientation  lattice  provides  subordination  and 
orientation  relationships  among  the  parent  contour  and  any  contours  which  it  may  contain.  Figure  1 
illustrates  the  way  in  which  the  tools  are  used  as  feeder  technologies  for  automated  map  reasoning. 


1.0  Problem  Definition:  Metricai  Inclusion. 

Given  a  digitized  representation  of  a  possibly  multiply-connected  closed  Jordan  curve  and  an  arbitrary 
point,  develop  a  reliable,  computationally  efficient  technique  to  decide  whether  the  point  is  interior  or 
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exterior  to  the  curve,  while  at  the  same  time  providing  the  respective  distance  and  direction  from  the 
curve  and  any  contours  which  it  may  contain. 


Figure  1.  Mathematical  Tools  to  Support  Automated  Map  Understanding 


f .  1  Distance  Calculations  on  a  Binary  Map. 

There  are  several  distance  metrics  discussed  in  the  digital  image  processing  literature.  A  good 
compendium  of  these  is  contained  in  [G4].  The  distance  used  here  is  the  city-block  distance,  also 
referred  to  as  the  d4  distance  (R21.  This  metric  is  used  for  point-to-point,  point-to-contour,  and 
contour-to-contour  distance  measurements.  The  discussion  which  follows  refers  to  trace  contours;  a 
trace  contour  is  the  boundary  of  a  spatial  feature  represented  on  a  binary  map. 

Definition  1.1.1.  Distance  Between  Two  Points. 

Let  PI  =  (xi,yi)  and  p2  =  (x2,y2)  be  arbitrary  points.  Then  the  d4  distance  between  pi  and  p2  is 
defined: 


d4[p1,P2]  =  1x1  -X2  I  +  lyi  -y2l 
Definition  1.1.2.  Distance  from  a  Point  to  a  Contour. 

Let  Tj  be  a  trace  contour,  and  {x,y)  an  arbitrary  coordinate.  Then  the  distance  from  (x.y)  to  Tj  is  defined ; 

d4[(x,y),  Tj]  =  min{lxi-xl  -r  lyi  -  yl},  V  points  (X|,yi)  6Tj. 

Definition  1.1.3.  Distance  Between  Two  Contours. 
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LetTjandTj  be  trace  contours.  Then  the  distance  from  Tj  to  Tj  is  defined: 

d4[Ti,Tj]  =  min {d4l(x,y),Tj]}V points (x.y)  ^Tj. 

1.2  Computing  the  Relative  Direction  of  an  Arbitrary  Point  From  a  Contour . 

Let  (xi,yi)  and  (x2,y2)  be  arbitrary  points.  Let  Ax  =  xi  -  X2  and  Ay  =  yi  -  y2.  Then  the  relative 
direction  of  (xi.yi)  from  {x2,y2).  denoted  dirKxi,yi),(x2,y2)l,  is  defined  piecemeal  by  the  following 
function; 


Conditions 

dir{(xi,yi),  (x2,y2)] 

Ax  >  0  and  Ay  >  0 

NE 

Ax  >  0  and  Ay  =  0 

E 

Ax  >  0  and  Ay  <  0 

SE 

Ax  =  0  and  Ay  <  0 

S 

Ax  <  0  and  Ay  <  0 

sw 

Ax  <  0  and  Ay  =  0 

w 

Ax  <  0  and  Ay  >  0 

ISJW 

Ax  =  0  and  Ay  >  0 

N 

Definition  1.2.  The  relative  direction  of  a  point  (x,y)  from  a  contour  T,  denoted  dirI(x,y),T],  is  defined: 
dir[(x,y),T]  =  dir  [(x,y),  (xc,yc)  I  (xc.yc)  ^  T  i  D4[(x.y).T]  =  D4[(x,y),(xc,yc)]] 


2.0  Automated  Discrimination  of  the  Interior  and  Exterior  of  a  Closed  Contour . 

2.1  Other  Approaches  to  the  Problem. 

Deciding  if  a  point  lies  inside  or  outside  a  closed  contour  is  a  problem  for  which  implementation  is 
non-trivial,  especially  when  the  caveat  is  added  that  the  contour  may  be  multiply-connected.  There  are 
other  techniques  which  address  elements  of  Problem  Definition  1 .0.  However,  these  approaches  are  of 
limited  utility.  One  such  technique  is  the  odd-even  count  contour-crossing  technique,  also  known  as  the 
parity  algorithm  [SI].  A  potentially  powerful  approach  to  the  problem,  this  technique  "draws  a  line" 
from  the  coordinate  in  question  "through"  a  contour,  and  counts  the  number  of  times  the  contour  is 
crossed.  The  technique  answers  in  the  affirmative  if  the  number  of  crossings  is  odd,  and  in  the  negative 
otherwise.  This  is  an  example  of  a  technique  which  avails  itself  of  elegant  theoretical  results  from  graph 
theory  and  topology,  but  for  which  implementation  is  fraught  with  error.  The  problems  with  the 
technique  are  evident  from  the  words  which  are  double-quoted.  First,  an  implementation  must 
accommodate  drawing  a  line  in  the  appropriate  direction  to  "appropriately'  intersect  the  contour. 
Digitized  contours  of  general  complexity  are  frequently  convoluted  in  such  a  way  that  a  true  crossing  is 
not  detected,  or  a  false  crossing  is  counted  as  a  valid  one.  Secondly,  if  a  contour  is  multiply-connected,  it 
is  possible  for  a  point  lying  within  one  of  the  "holes"  to  be  considered  outside  the  parent  contour, 
which  is  a  theoretically  correct  response,  but  is  a  failure  from  a  pragmatic  stance.  This  can  have  serious 
ramifications  in  practice  -  for  example,  a  boat  on  Great  Salt  Lake  would  not  be  contained  within  the 
state  of  Utah.  Another  shon.coming  is  that  the  decision  relies  upon  a  simplistic  count,  and  returns  no 
metrical  information  about  distance  or  direction  from  contours. 
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2.2  The  Common  Sense  Logic  Behind  the  Inner  Annulus  Discriminator  Technique. 

Intuitively,  the  inner  annulus  of  a  closed  contour  is  a  set  of  points  adjoining  the  inner  edge  of  the 
contour.  The  annulus  is  chosen  to  lie  inside  the  contour  because  it  is  more  computationally  efficient 
than  constructing  it  outside  (an  inner  track  is  shorter).  The  sole  purpose  of  generating  the  inner  annulus 
is  to  create  a  memory-efficient  computing  technique  to  differentiate  between  the  interior  and  exterior 
of  the  contour.  Simply  stated,  a  check  is  made  to  determine  if  the  point  is  nearer  the  contour  or  its  inner 
annulus.  If  it  is  nearer  the  annulus,  it  is  decided  that  the  point  is  inside  the  contour;  otherwise,  it  is 
outside  (Figure  2). 


P1 


8 

7 

_s_ 

9 

4 

10 

3 

□ 

2 

il 

PZ 

1 

E 

Jll 

18 

IS 

r7_ 

16 

□ 

□ 


a  Original  Contour 
=  Inner  Annulus 


Figure  2.  A  graphic  to  illustrate  the  utility  of  the  inner  annulus  of  a  closed  contour. 
Point  P2  is  nearer  the  annulus  and  therefore  inside;  conversely.  PI  is  outside. 


2.3  How  Multiply-Connected  Contours  are  Accommodated  by  the  Inner  Annulus  Technique. 

A  hole  contained  in  a  closed  contour  is  itself  a  closed  contour;  therefore  it  possesses  an  inner  annulus. 
Equidistance  loci  reduction  (see  discussion  at  1 .0)  tells  us  if  a  point  is  nearer  a  multiply-connected  parent 
contour  or  one  of  the  holes  it  contains.  If  the  point  is  nearer  the  parent  contour,  a  test  is  maoe  to 
determine  if  the  point  is  nearer  the  contour  or  its  inner  annulus,  to  assert  exterior,  or  interior 
respectively.  If  the  point  is  nearer  a  hole,  the  test  is  performed  on  the  hole  and  its  inner  annulus,  to 
assert  exterior  or  interior  to  the  hole,  respectively.  But  interior  to  the  hole  means  exterior  to  the  parent 
contour,  from  the  definition  of  multiply-connected  set.  In  practice,  this  paradox  is  circumvented  by 
resorting  to  the  feature  orientation  lattice,  within  which  the  topology  of  a  multiply-connected  set  is 
embedded. 


2.4  Automated  Generation  of  the  Inner  Annulus  of  a  Closed  Contour. 

Definition  2.4a.  An  inner  annulus  generator  function  Ig  is  a  function  with  domain  the  closed  contour  Tj, 
and  range  Ij  defined  as  follows: 
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a)  Starting  at  an  arbitrary  point,  order  Tj  in  a  counterclockwise  direction  and  call  the  result  Tj{n), 
where  n  is  the  length  of  Tj. 

b)  For  i  from  2  to  n  +  1,  let  Tj(i)  =  (x\,  yj),  Ti(i  +  1)  =  (xi  +  i.  yi  +  i).  and  Tj{i-1)  =  (xj-i,  yj-i). 
Tj(n  +  1)  =  Tj(1).  Furthermore,  let  Ax  =  xj  -  xj.i  and  Ay  =  yj  -  yj.];  Axnew  -  xj  +  i  .  xj  and 
Aynew  =  yi  +  i  -  yj. 


Conditions 

FI.  Ax  >  0  and  Ay  >  0 
Scf.  (and  Ay  =  0,  Axnew>0,  Aynew<01 
Sc-?,  [and  Ay  >0,  Axnew  >0,  Aynew <0] 
St^.  [and  Ay  =  0,  Axnew  =  0,  Aynew >0] 


Element  of  Ij  Produced 
Insert  (xj-^yj-i  +  1) 

[Insert  (xj-i  +  l.yi-i  +  1)] 
[Insert  (xj-i  +1,yi-i+2)] 
[Remove  (xi.i,yi.i)] 


F2.  Ax  >  0  and  Ay  <  0 
Sc?  [and  Ax  =  0,  Axnew<0,  Aynew<0] 
Sc2.  [and  Ax>0,  Axnew<0,  Aynew<0] 
Sf*-  [and  Ax  =  0,  Axnew  >0,  Aynew  =  0) 


Insert  (xj-i  +  1,yi.i) 
(Insertfxj.i  +  1,yi.i  - 1)] 
[Insert  (xj.i  +  2,yi.t  - 1)1 
[Remove  (xi.i,yi. 1)1 


F3.  Ax  <  0  and  Ay  >  0 
Scl-  [and  Ax  =  0,  Axnew >0,  Aynew >0] 
Sc-?,  [and  Ax<0,  Axnew  >0,  Aynew  >01 
St*,  [and  Ax  =  0,  Axnew<0,  Aynew  =  0] 


Insert  (xj-i  - 1,  yj-i) 

[Insert  (xj.i  -1,yi.i  1)] 

[Insert (xj-i  -2,  yj-i  +  1)] 
[Remove  (xj.i.yj.i)] 


F4.  Ax  <  0  and  Ay  <  0 
Scf.  [and  Ay  =  0,  Axnew<0,  Aynew >0] 
Sc2.  [and  Ay<0,  Axnew<0,  Aynew>0l 
St*  [and  Ay  =  0,  Axnew  =  0,  Aynew  <01 


Insert (xj-i, yj-i  -1) 
[Insert  (xj-i  -  1,yi-i  -  i)l 
[Insert  (xj-i  -  1,yj.i  -2)1 
[Remove  (xi.i,yi.i)l 


F1-F4  are  first  order  operators. 
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Scl,  Sc2,  sndSf*  are  second  order  operators. 

Definition  2.4b.  The  inner  annulus  Ij  is  the  set  of  points  produced  by  inner  annulus  generator  function 
Ig  operating  on  closed  trace  contour  Tj. 

2.5  The  Interior-Exterior  Discrimination  Function. 

Once  the  inner  annulus  of  a  closed  curve  is  generated,  the  discrimination  process  is  a  simple  comparison 
of  distance  to  the  curve  with  distance  to  its  annulus. 

Definition  2.5.  A  point  (x,  y)  is  said  to  be  on  the  exterior  of  closed  contour  Tj,  with  inner  annulus  I]  iff 
d4[(x.y).  Tj]  <  d4[(x,y),  Ij).  Otherwise,  (x,  y)  is  said  to  be  on  the  interior  of  Tj. 


Figure  3.  The  first-order  inner  annulus  generator  operators.  In  actuality,  there  are  four 
distinct  outputs,  since  the  following  pairs  of  operators  produce  the  same  points;  NE-E, 
SE-S,  SW-W,  and  NW-N. 


2.6  Concave  Points  Require  Forceful  Introduction:  the  Set  C. 

There  are  situations  when  the  first  order  inner  annulus  generator  operators  are  not  sufficient  to 
produce  a  continuous  annulus  -  examples  are  the  concavity  situations  in  Figure  4  (a),  (b),  and  (c).  In  such 
cases,  to  insure  continuity,  it  is  necessary  to  invoke  second-order  concavity  operators  to  add  points  to 
the  inner  annulus.  In  case  (4b),  the  discriminator  commits  an  error  unless  the  first  order  outputs  are 
supplemented:  if  point  III  is  not  added  to  the  annulus,  then  it  is  closer  by  one  pixel  to  the  original 
contour,  resulting  in  the  erroneous  decision  that  it  lies  outside. 

For  a  given  trace  contour,  the  set  C  is  defined  to  be  the  set  ofali  points  introduced  by  the  second  order 
concavity  operators. 
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Figure  4.  Sources  of  local  concavity,  three  of  which  require  special  second  order 
operators  to  introduce  additional  pixels.  To  ensure  a  continuous  inner  annulus, 
behaviors  a.b,  and  c  require  that  a  third  element  (III)  be  attached  to  the  inner  annulus 
after  generation  of  elements  I  and  II. 


2.7  Intersections  with  the  Original  Contour:  the  Set  T*. 

The  possibility  exists  that  extraneous  elements  belonging  to  the  original  closed  contour  may  be 
introduced  by  the  first  order  operators  of  the  inner  annulus  generator  function.  Just  as  concavities  in  a 
closed  contour  mandate  that  second  order  operators  be  invoked  to  introduce  points  forsaken  by  the 
first  order  operators,  local  convexities  require  second  order  treatment  to  remove  misbegotten  points 
(Figure  5c).  It  will  be  shown  that  all  such  points  are  produced  by  the  convex  corner  triplets  of  the 
contour,  but  first  it  is  necessary  to  formally  define  what  is  meant  by  a  convex  corner  triplet. 


1 - 

1  3 

H 

1 

□ 

(*)  (b) 


T! 
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SB 
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Lccil  Convaxity 


Figure  5.  Sources  of  local  convexity,  three  of  which  require  special  second  order 
operators  to  remove  misbegotten  pixels.  Three  of  the  sources  generate  problematic 
annulus  elements:  behaviors  b  and  d  generate  the  same  element  (**)  twice,  whereas 
behavior  c  generates  an  element  (1)  on  the  original  contour.  Note  that  the  figures  are 
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respective  mirror  images  through  the  vertical  axis  of  those  shown  for  local  concavity  at 
figure  3. 


Definition  2.7.1.  Let  (xj.i,yj.i),  (xj.yj)  and  (xj  +  i,  yj  +  i)  be  three  consecutive  elements  of  a 
counterclockwise-ordered  contour  Tj.  Let  Ax  =  xj  -  xj-i  and  Ay  =  yj  -  yj-i,  Axnew  =  xj  +  i  -  xj  and  Aynew 
=  yi  +  1-yi  If  one  of  the  following  conditions  is  true,  then  {(xi-i.yi-i),  (xj.yj),  (xj  +  1,  yj  +  i)}  issaid  to  be 
a  convex  corner  triplet  of  contour  Tj. 


Ax 

Ay 

Axnew 

Aynew 

>0 

=  0 

=  0 

>0 

=  0 

>0 

<0 

=  0 

<0 

=  0 

=  0 

<0 

=  0 

<0 

>0 

=  0 

The  set  T*  is  the  set  of  all  convex  corner  triplets  contained  in  a  closed  contour. 

Theorem  2.7.  An  inner  annulus  generator  function  Ig  produces  a  point  on  the  original  contour  if  and 
only  if  the  point  is  the  first  element  of  a  convex  corner  triplet. 

The  proof  consists  of  two  parts: 

i.)  If  a  point  generated  by  an  inner  annulus  generator  function  Ig  is  on  the  original  contour,  then  the 
point  must  be  the  first  element  of  a  convex  corner  triplet. 

Proof  (by  enumeration):  from  a  given  contour  point  (xs,y$),  there  are  8  possible  directions  in  which  to 
proceed,  4  diagonal  (to  the  D-connected  pixels)  and  4  non-diagonal  (to  the  4-connected  pixels).  Each  of 
these  moves  in  turn  may  proceed  to  one  of  five  points  (without  violating  the  condition  that  a  contour 
contains  no  loops).  Thus,  there  are  4*5  +  4*5  =  40  possible  triplets  emanating  from  the  start  point. 
The  search  space  may  be  reduced  by  exploiting  the  fact  that  orthogonal  rotations  preserve  triplet  shape. 
Since  the  4  diagonal  moves  are  rotational  variants  of  each  other,  as  are  the  4  non-diagonal  moves, 
without  loss  of  generality  a  move  to  the  NE  is  selected  as  the  first  diagonal  move,  and  a  move  to  the  N  as 
the  first  non-diagonal  move,  resulting  in  a  reduction  of  the  search  space  from  40  to  10.  From  the  start 
point  (xs,ys),  the  diagonal  (NE)  move  generates  the  point  (xj  +  1,ys  +  1)  and  the  non-diagonal  (N)  move 
generates  (xs,ys  +  1).  From  each  of  these  points  in  turn  there  are  5  possible  moves,  as  enumerated  in  the 
following  table: 


D-connected  (NE  move)  4-connected  (N  move) 


Original 

Ig-Cenerated 

Original 

Ig-Generated 

(xs.ys) 

- 

(xs.ys) 

- 

(xs+  1,ys+  1) 

(xs,ys+  1) 

(xs.ys  +  1) 

(xs-i,ys) 

(xs  +  2,ys  +  2) 

(xs  +  l,ys  +  2) 

(xs,ys  +  2) 

(xs-i.ys-*- 1) 
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b)  (xs.ys  +  2) 

c)  (xs+l,ys  +  2) 

d)  (xs  +  2,ys+1) 

e)  (xj  +  2,ys) 


(xs.ys+  1) 

(xs,ys+  1) 
(xs+  1,ys  +  2) 
(xs  +  2,ys+  1) 


(xs-1.ys  +  2) 

(xs-1,ys+  1) 

(xs+  1,ys  +  2) 

(xs+  i.ys+ 1) 


(xs-1.ys+  1) 
(xs.ys) 

(xs,ys  +  2) 
(xs.ys  +  2) 


Backtracking  reveals  that  the  underscored  point  is  the  only  point  contained  in  the  original  contour. 
Furthermore,  it  is  the  first  element  of  the  convex  corner  triplet  {(xs,ys),(xs,ys  +  1).(xs-1,ys  +  1)}.  In  similar 
fashion,  it  can  be  shown  that  the  4-connected  (non-diagonal)  moves  E,  S  and  W  produce  generator 
elements  which  are  the  first  points  of  convex  corner  triplets  lying  on  the  original  contour.  This 
completes  part  i)  of  the  proof. 


ii.)  If  a  point  is  the  first  element  of  a  convex  corner  triplet,  then  it  is  assigned  by  the  inner  annulus 
generator  function  Ig  to  the  original  contour. 

Proof:  Without  loss  of  generality,  let  the  convex  corner  triplet  consist  of  the  points  (xs,ys),  {xs,ys  +  1), 
and  (xs-1,ys+  1),  which  is  a  north  move  followed  by  a  west  move.  Then,  by  definition  of  the  inner 
annulus  generator  function,  the  first  pair  of  points  generates  the  interior  point  (xs-1,ys),  whereas  the 
second  pair  generates  (xj.ys),  which  is  the  first  point  of  the  convex  corner  triplet  we  started  with.  In 
similar  fashion,  it  follows  that  the  other  three  orientations  of  a  convex  corner  triplet  yield  intersections 
with  the  original  contour.  This  completes  the  proof  of  Theorem  2.3. 


2.8  Points  Multiply  Visited  fy  the  Inner  Annulus  Generator:  the  Duplicate  Set  dW. 

It  is  possible  for  a  point  to  be  produced  more  than  once  during  the  generation  of  the  inner  annulus. 
Examples  are  shown  in  the  local  convexity  illustration.  Figure  5(b)  and  (d).  Let  D(k)  denote  the  set  of 
points  visited  at  least  k  times  during  the  generation  of  the  inner  annulus.  When  computing  the  length 
of  the  inner  annulus,  it  is  necessary  to  subtract  the  number  of  points  which  are  visited  at  least  twice,  at 
least  three  times,  and  at  least  four  times,  since  each  such  point  need  be  counted  only  once  in  the  length 
calculation. 

The  setDflf)  is  the  set  of  points  produced  multiple  times  after  generating  the  inner  annulus. 


3.0  The  Bottom  Line:  How  Much  Memory  Does  the  Inner  Annulus  Consume? 

The  first  order  generator  function  Ig  assigns  points  on  a  one-to-one  basis  from  the  original  contour  to 
the  inner  annulus.  There  are  n  such  mappings.  The  second  order  concavity  operators  add  points  missed 
by  the  first  order  operators;  there  are  ICI  such  points.  The  second  order  convexity  operators  remove 
points  which  are  elements  of  the  original  contour;  there  are  IT*I  of  these.  Finally,  there  are  points  which 
are  generated  multiple  times;  there  are  ID(2)i  +  ID(3)|  +  I D(4) I  of  these. 

Conjecture  3.0.  The  general  expression  for  the  length  of  the  inner  annul  us  Ij  of  a  closed  contour  Tj  is: 

4 

I  Ij  I  =  n  +  ICI  -  I T*  I  -  S  I  D  (k)  I , 

k  =  2 
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where 


n  =  the  length  of  contour  Tj 
C  =  the  set  of  local  concavities  contained  in  Tj 
T*  =  the  set  of  convex  corner  triplets  contained  in  Tj 
D(k)  =  the  set  of  points  produced  at  least  k  times  by  generator  Ig. 

4.0  Examples  of  Automated  Inner  Annulus  Generation. 

This  section  provides  several  examples  of  the  inner  annulus  generator  function  operating  on  contours 
especially  chosen  to  exhibit  peculiar  behavior.  Also,  Conjecture  3.0  is  validated  for  the  examples. 


Contour?  n  ♦  |C|-  |T*|  -  1D(2)(  •  |d(3)|  •  |D(^)|  ■  |l| 

a)  No 

b)  No 

c) Yes  404000  0 

d)  Yes  4  0  0  1  1  1  1 

e)  Yes  8  0  4  1  1  1  1 

f)  Yes  12  4  0  5  5  1  5 


Figure  6.  Diagrams  (a)  and  (b)  are  not  contours  because  each  violates  a  condition  of  the 
definition  (the  first  has  length  less  than  4,  and  the  second  contains  a  loop  from  point  1 
to  point  3).  Figures  (c)  and  (d)  are  the  shortest  contours  possible.  The  inner  annulus  of 
(c)  is  the  null  set;  whereas  that  of  (d)  is  a  singleton  set.  Figure  (e)  is  interesting  because 
the  inner  annulus  generator  produces  four  elements  of  the  original  contour,  while  it 
also  produces  the  same  interior  element  four  times.  Figure  (f)  is  presented  to 
demonstrate  multiple  local  concave  behavior. 
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Figure  7.  The  generation  of  the  inner  annulus  for  a  sample  contour  of  length  38.  The 
annulus  elements  are  numbered  in  the  order  they  are  produced  by  the  generator 
function.  Pixels  marked  with  a  T  are  elements  of  the  original  contour  produced  by 
convex  corner  triplets;  there  are  6  such  cases.  Pixels  marked  with  a  C  are  produced  by 
second-order  local  concavity  operators;  there  are  5  such  instances.  Those  marked  with 
a  0  are  multiply  assigned  by  the  first  order  generator  function;  there  are  9  assigned  at 
least  twice,  and  one  assigned  at  least  three  times.  By  Conjecture  3.0,  the  length  of  the 
annulus  is  predicted  to  be  38  *  5  •  6  -  9  •  1  s  27,  which  is  the  result  obtained  in  practice. 
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5.0  Complexity  Considerstions. 


Recall  that  the  inner  annulus  is  the  set  of  pixels  4-connected  to  the  inner  edge  of  a  closed 
contour,  and  that  an  inclusion  decision  is  made  by  comparing  the  distance  to  the  contour  with  the 
distance  to  the  annulus.  For  implementation  purposes,  it  is  important  to  note  that  the  nearest  contour 
element  is  actually  4-connected  to  the  nearest  annulus  element,  which  produces  an  efficient  algorithm. 
If  the  original  contour  is  of  length  n  pixels,  it  can  be  sorted  in  n*  log  n  preprocessing  time,  and  uses 
space  of  order  n.  Once  the  sort  is  done,  a  run  time  distance  calculation  to  the  annulus  can  be  performed 
in  time  log  n  +  c,  where  log  n  is  the  time  required  for  binary  search,  and  c  is  the  constant  time  required 
to  compute  the  4-connected  annulus  element. 

When  comparing  the  annulus  technique  to  competitive  techniques,  recall  that  other  techniques 
do  not  succeed  for  multiply-connected  sets,  nor  do  they  return  the  distance  and  direction  to  the 
contour.  This  latter  information  is  important  for  many  real-world  applications;  for  example,  temporal 
reasoning  with  two-dimensional  maps.  Therefore,  although  other  contour  inclusion  techniques  may 
seem  to  be  competitive  in  time  complexity,  the  annulus  technique  is  actually  rendering  a  richer  source  of 
information  in  the  same  amount  of  time. 


6.0  Conclusions. 

A  function  which  tests  for  metrical  inclusion  of  an  arbitrary  point  within  a  multiply-connected  closed 
contour  has  been  introduced.  Metrical  inclusion  means  that  the  distance  and  direction  to  the  nearest 
point  of  the  contour  is  rendered  along  with  the  inclusion  decision.  The  technique  exploits  a  new  data 
structure  called  the  inner  annulus,  which  is  automatically  computed  with  a  corresponding  generator 
function,  using  as  input  the  digital  representation  of  a  closed  Jordan  curve.  It  is  shown  that  the  annulus 
technique  provides  a  richer  source  of  information  than  the  technique  for  simply-connected  sets  based 
on  an  even  or  odd  count  of  contour  crossings,  since  the  latter  technique  does  not  return  either  distance 
or  direction  along  with  the  inclusion  decision.  A  conjecture  is  posed  concerning  the  length  of  the  inner 
annulus  in  terms  of  the  length  and  shape  of  the  original  contour.  An  attempt  to  prove  the  conjecture 
has  yielded  several  results  pertaining  to  the  convex  and  concave  behavior  of  the  parent  contour.  The 
complexity  of  the  optimized  inclusion  algorithm  is  shown  to  be  of  order  log  n  execution  time,  requires 
order  n  space,  and  consumes  n  *  log  n  preprocessing  time. 
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ABSTRACT .  The  classical  bound  on  the  error  in  linear  interpolation  of 
function  f  on  interval  {a,b)  is 

5  (6-a)«llf"ll,a,B) 


where 


«^"«(a,b)  =  Max  lf"(x)l 
a<x<b 

We  will  show  how  to  obtain  a  mesh  x  for  which 

(Xi+i-Xi)*llf"ll(x^,Xi+i)  =  constant  (very  nearly) 

l^i<n 

The  solution  of  this  problem  is  important  for  the  purpose  of  providing  accurate 
functional  data  in  tabular  form  for  use  in  numerically  controlled  manufacturing 
machines  (CAM). 

GOOD  MESHES.  deBoor  [1]  has  supplied  us  with  a  simple  method  for 
generating  good  meshes.  His  idea  is  to  make  the  classical  error  bound  roughly 
constant: 


or 


or 


i^l-Xi)*llf”ll(xi,Xi.i) 


constant 
l^i  <n 


(Xi  +  i-Xi  )  llf'll  (x^  »x-j+i) 


constant 


X 


/ 

X 


i  + 
i 


1 


»^i+l ) 


dx 


c 


As  n  becomes  large  while  1  f"(x)|  >  0,  neighboring  x's  will  get  close 
together.  This  partially  justifies  the  following  asymptotic  approximation  of 
the  norm  of  f"; 

'f'lXi.Ki.i)  -  I  •  g(x) 
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So  we  solve  the  simpler  problem 

/  g(x)dx  =  c 
,x 

/  g(t)dt,  we  have 

G{Xi)  =  /  g(t)dt  =  (i-l)c 
><1 


G(Xn)  =  (n-l)c 

Therefore 

G(Xi)  _  i-i 
G(Xn)  '  n-i 

and  we  finally  have  deBoor's  method  for  generating  good  meshes  for  piecewise 
linear  interpolation 

Xi  =  G-’(™  Q(xn)) 


If  we  define  6(x)  = 


and 


where 


and 


X 

G(x)  =  /  g(t)dt 

xi 


g(t)  =  I  f"(t)l  ^ 


In  practice,  we  typically  have  only  a  positive,  continuous,  piecewise 
linear  estimate  of  g  over  some  mesh  u.  We  will  denote  this  estimate  of  g  by  v. 
G  as  defined  by 

X 

G(x)  =  /  v(t)dt  uj  <  X  <  Urn 
^1 


would  then  be  piecewise  quadratic  and  invertible  in  the  following  manner: 


where 


G-’ (G*) 


=  X*  =  U-i 


2(G*-Gi) 
v^  +  /d 


Gi  =  0  ,  Gj+i  =  Gj  +  (Uj+i-Uj)(vj+Vj+i)/2  1  <  j  <  m 
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and 


G-j  <  G*  <  G^j+i  , 
p  =  (G*-Gi)/(Gi+i-Gi)  , 


2  2 

D  =  (l-p)v^  +  pVi+1 


Unfortunately,  good  meshes  are  not  always  quite  as  good  as  we  might  like 
them  to  be.  Specifically,  the  lengths  of  the  longer  subintervals  are  always 
overestimated  because  the  asymptotic  approximation  to  the  norm  of  f"  is  least 
valid  for  these  longer  subintervals.  This,  of  course,  leads  to  larger  error 
bounds  on  the  longer  subintervals.  The  error  bounds  on  the  shorter  subintervals 
are  always  pleasingly  uniform  because  the  asymptotic  approximation  is  most  valid 
for  these  shorter  subintervals.  In  addition,  it  is  easy  to  prove  that  for 
f(x)  =  x*^  (p  >  2,  0  <  X  <  1) ,  the  largest  error  bound  on  a  good  mesh  is  exactly 
equal  to  the  largest  error  bound  on  a  uniform  mesh  (x^’+j-x,  =  const)! 


BETTER  MESHES.  The  following  problem  is  correctly  described  in  [1]  as 
being  quite  difficult  to  solve  in  general: 


Find  n-2  x's  (xj  and  Xp  fixed)  such  that 


(Xi+i-xi)llf"ll(x.,xi  +  i)  =  (>«i+l-Xi)“9H{xi,Xi+i)  =  c  for  1  <  i  <  n 

Even  if  we  knew  what  c  was,  solving 


(^i+l->^i)«g"{Xi,xi+i)  »  c 

for  x^'+i  given  x-j  would  still  be  quite  difficult  in  general. 

The  interesting  thing  is  that  this  problem  is  not  difficult  to  solve  if  we 
substitute  v  for  g!  In  addition,  if  v  is  a  very  good  approximation  to  g  (with  m 
>>  n),  we  will  get  a  virtually  constant  error  bound  for  the  entire  mesh.  At  any 
rate,  irrespective  of  the  accuracy  of  v,  we  will  be  doing  the  best  we  can  under 
the  circumstances  to  solve  the  original  problem. 

MESH  FUNCTION  u(c).  For  the  correct  value  of  c  and  for  the  correct  mesh  x. 


or 


(Xi  +  l-Xi)llvll(xi,XT+l)  =  c 


/i+1 

/  .  «v«(xi,xi+l)dx  =  c 


Defining  the  piecewise  constant  function  y  by 

y(x)  =  »Vll(Xi,Xi+l)  (Xi  <  X  <  Xi+i) 


we  obviously  have 


y(x)  >  v(x)  for  all  x 
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Now,  since 


we  have 


/i+1 

j  y{x)dx  =  c 
x^ 


/  Y(x)dx  =  (n-l)c 
^1 


but  y{x)  >  v(x)  ^  0,  so 


/  y(x)dx  >  /  v(x)dx 

xi  Uj 


,  Til 

(n-l)c  >  /  v(x)dx 

Therefore,  a  lower  bound  on  the  correct  value  of  c  is  given  by 

---  /  v(x)dx 

''1  uj 

For  an  incorrect  value  of  c(c),  we  define  y  and  x: 


y(x)  =  {^i  ^  X  <  i^+l) 


where 


Therefore 


/i+1  - 

/_  y(x)dx  =  c 
x^ 


/u+1  -  _  /v+1  - 

/  y(x)dx  =  (v-l)c  +  /_  y(x)dx 


where  v  is  the  number  of  subintervals  over  which  y  is  defined,  Xy+j  =  u,n,  and 

/v+1  _ 

/_  y(x)dx  ^  c 


We  also  have 


(u-l)c  =  /  y(x)dx 
>*1 
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therefore, 


1  - 

V  =  1  +  =  /  y(x)dx 
C  Xj 


Now  since 


- 

/  y(x)dx 


is  bounded  below  by 


/  v(x)dx 


and  above  by 


(Um-Ui)llvll(ui,u„) 

V  is  large  for  small  c  and  small  for  large  c.  Now,  for  the  correct  value  of  c, 
we  want 

/u+1  _  /n 

/  y(x)dx  =  /  y{x)dx 

or 

_ 

(v»-l)c  +  /_  y{x)dx  =  {n-l)c 

Xy 

We  therefore  define  the  mesh  function  fi  by 

H(c)  =  (u-n)c  +  (Um-Xv)»vn(xy,un,) 

For  small  c,  v  will  be  large  and  n  will  be  positive.  For  large  c,  u  will  be 
small  and  p  will  be  negative.  For  the  correct  value  of  c,  p  will  be  zero.  But 
may  there  be  more  than  one  zero  of  p?  For  the  correct  value  of  c,  we  will  have 

(u„-xy)llvy{xy,u„)  =  c 


and 


y  =  n-1 

For  a  slightly  smaller  value  of  c,  Xy  will  be  very  close  to  u^,  and  we  will  have 
v  =  n.  This  means  that  the  entire  (nearly  zero)  contribution  to  p  will  be  made 
by 


(Um-Xy)IIVll(iy,u^) 
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Now  suppose  that  the  original  function  f  has  a  perfectly  linear  region  on 
the  right  and  v  is  identically  equal  to  zero  over  a  finite  interval.  The  norm 
of  V  will  therefore  be  zero  and  further  slight  changes  of  c  will  still  not 
change  v  and  will  still  maintain  a  zero  norm  for  v.  We  therefore  conclude  that 
^  can  be  identically  equal  to  zero  for  some  finite  interval  to  the  left  of  the 
correct  c.  This  is  not  really  a  problem,  however,  since  we  only  need  to  be  sure 
to  compute  the  riqhtmost  zero  of  p. 

We  now  show  the  algorithm  for  computing  p(c).  To  compute  y(c): 

a:=Ui  ,  Xi:=Ui  ,  V:=0 


-i 


When  p  has  been  evaluated  for  the  last  time,  near  the  correct  valuis  of  c,  we 
shall  have  collected  the  correct  mesh  x. 

SOLVING  (b-a)  ilvll  (g  5)  =  c  FOR  b.  This  nonlinear  equation  turns  out  to  be 
quite  easy  to  solve  noniterative ly  due  to  the  piecewise  linearity  of  v. 

Given  a,  i,  and  c  such  that  u^  <  a  <  u-j  +  j,  we  want  to  find  b  such  that 
(b-a)  llvll  (a  b)  =  c.  We  set  j  =  i  +  1  initially  and  increment  j  as  necessary  until 

Mp(uj-a)  >  c 


and 


Mj(uj_i-a)  $  c 


where 


=  v(a)  if  j  =  i+1 

Mj  =  Max{v(a)  ,v-j+2, . .  .Vj_i|  if  j  >  i  +  1 
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If  Mp  >  Mj,  we  first  need  to  compute  the  transition  point  t.  Let 


S  =  (Vj-Vj_l)/(Uj-Uj_l) 
=  (Mr-M|)/(Uj-t) 


Therefore, 


t  =  Uj  -  (Mr-M|)/s 

If  Mjj{t-a)  >  c,  b  must  lie  to  the  left  of  t,  so  we  want 

(b-a)M|  =  c 

Again  we  have  trivially  that 


b  =  a  ♦  c/M| 

Now  if  M,{  t-a)  <  c,  then  b  must  lie  between  t  and  uj.  We  therefore  want 

(b-a)v(b)  =  c 


but 


Mp  -  v(b) 

~~uy~'b~ 


s 


Therefore, 

v(b)  =  Mp  -  s(uj-b) 

and 


(b-a) (Mp-s(uj-b) )  =  c 
=  (b-a) (Mp-s(uj-a+a-b) ) 

=  (b-a)(Mp-s(Uj-a)+s(b-a)) 

We  therefore  have  the  following  quadratic  equation  for  b-a: 

s(b-a)*  +  (Mp-s(uj-a) ) (b-a)  -  c  =  0 
Letting  k  =  Mp-s(uj-a),  we  have 


b-a 


-k  ±  /k*+4sc 
2s 


We  need  the  plus  sign  in  order  to  get  b  -  a  >  0  irrespective  of  the  sign  of  k. 
We  therefore  have 
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b  =  a 


+ 


/k*+4sc 

2s 


k 


for  k  <  0 


and  for  k  >  0,  we  rationalize  to  get 


k  +  /k*+4sc 

Thus  we  see  that  the  computational  complexity  of  solving  (b-a)  llvll  tj)  =  c  for  b 
is  nearly  trivial  indeed,  making  the  "better  mesh"  algorithm  quite  efficient. 

SOME  COMPUTATIONAL  RESULTS.  All  the  following  examples  have  been  obtained 
using  a  uniform  preliminary  mesh  of  size  m  and  exact  evaluation  of  f".  The 
first  two  figures  (Figures  2  and  3)  show  a  virtually  constant  error  bound  pat¬ 
tern  for  the  two  functions  x’®®  and  (l-x)’®®.  These  two  functions  have  very 
flat  regions  to  the  left  and  right,  respectively,  and  their  respective  meshes 
are  naturally  mirror  images  of  each  other.  The  mesh  functions  for  these  two 
cases  are  quite  different,  however,  as  indicated  by  Figures  4  and  5.  The 
(l-x)’®®  mesh  function  behaves  as  it  does  due  to  the  large  flat  section  to  the 
right.  Note  the  lack  of  monotonicity  further  to  the  left  of  the  correct  value 
of  c  and  the  way  p  becomes  and  remains  monotonic  as  c  is  approached  from  the 
left.  Also  note  that  the  correct  values  of  c  are  the  same  for  both  cases,  as 
expected. 

Figures  6  through  9  compare  good  and  better  meshes  for  the  test  function 
x’®(l-x)*®.  Note  the  larger  error  bounds  for  the  longer  subintervals  in  the 
good  mesh.  Figures  10  and  11  indicate  the  effect  of  reducing  m  and  the  accuracy 
of  V. 


Figures  12  through  15  compare  good  and  better  meshes  for  a  larger  value  of 
n.  Figure  16  shows  the  effect  of  small  m.  Note  that  there  is  hardly  any 
discernible  difference  between  the  better  meshes  for  large  and  small  m.  Figures 
17  and  18  show  the  equality  of  the  largest  bounds  on  uniform  and  good  meshes, 
respectively.  Figure  19  shows  the  better  mesh  bounds  for  this  case. 

The  code  contained  in  the  Appendix,  written  in  SALOME,  computes  better 
meshes.  Reference  [2]  may  be  consulted  for  interpretation  of  the  code,  but  it 
should  be  sufficient  to  know  that  IF  and  FI  delimit  conditional  statements,  DO 
and  00  delimit  looping  statements,  comma  in  a  conditional  means  "then,"  semico¬ 
lon  in  a  conditional  means  "else,"  and  sharp  signs  (#)  delimit  loop  exit  con¬ 
ditions  . 
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ERROR  BOUND  PATTERN  FOR  BETTER  MESH 
X**100 
M=700  N=7 


FUNCTION  DEFINED  ON  GOOD  MESH 
X**10(  1-X)«20 
Mst000  N=10 


Figure  6 


ERROR  BOUND  PATTERN  FOR  GOOD  MESH 
X**10( 1“X)**20 
M=t000  N=10 


ERROR  BOUND  PATTERN  FOR  BETTER  MESH 
X**10( 1-X)**20 
M=1000  N=10 
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ERROR  BOUND  PATTERN  FOR  BETTER  MESH 
X**10( 1-X)**20 
M=600  N=60 


ERROR  BOUND  PATTERN  FOR  GOOD  MESH 
X«*S 

M=100  N-t0 


APPENDIX 


SUB  UEB'^SH  (  M  J  V  N  X  RELERR  ) - UEBMSH 

—  FOR  UNIFORM  ERROR  BOUND  MESHt  SOLVE  FMU(C)*0  BV  BISECTION 

I  — 

M=N3,  OF  ABSCISSAS  IN  PRELIMINARY  MESH 
U=ABSCISSAS  IN  PRELIMINARY  MESH 

VsAPPROXIMATION  TO  SQUARE  ROOT  OF  ABSOLUTE  VALUE  OF 
SECOND  derivative  ON  PRELIMINARY  MESH 
N*Na,  OF  POINTS  IN  FINAL  MESH 
X=ABSCISSAS  IN  FINAL  MESH 
RELERR=RELATI VE  ERROR  IN  COMPUTING  C 

OP  U  1  •  V  1  (  X  1  •  RELERR  . 

DO  Cl  ,  C2  ,  FMU  •  FCl  ,  FC2  •  ABSERR  ,  AE  ,  C  ,  FC  ,  CC  . 

1=0  Cl=0.000 
DO  I=I^l  f  I  >=  M  # 

ci=ci»(u(i  +  i)-um)*(v(  n+v{  i«-in/2.0Do  od 

Cl»Cl/(N-l) 

FCl=FMUICl,M,U#VtN,XJ 
a  FCl  >  o.pDO  a 
C2=l.300*Cl 
FC2=FMU(C2,M,U,V,N,X) 

DO  i  FC2  <  0.000  # 

IF  FC2  >  0.000  f  Cl=C2  FI 
C2=l.3D0*C2  FC2=FMU(C2,M,Uf VtN,X)  00 
A3SERR=RELERR*Cl 
A6=2.000*ABSERR 
DO  #  AE  <  ABSERR  § 

C=(Cl+C2)/2.000 
FC=FHU(Cf M,U, Vf NiX) 

IF  FC  >=  0.000  f  Cl=C  ;  C2=C  FI 
AE=C2-C1  00 

RET 


=  =>  GETC  I  CC  I - GETC 

—  GET  CONSTANT 

cc=c 

RET  END 

OPFUN  FHU  I  C  M  U  V  N  X  I - FMU 


—  MESH  FUNCTION  OF  C  TO  BE  ZEROED  TO  GET  UNIFORM  ERROR  BOUND  MESH 
OP  FHU  tCfUlfVlfXl. 

DP  A  ,  VNORM  ,  B  . 

1=1  A=U(1)  X(l)=U(IJ  NU=0 

DO  FINDS  C  A  I  C  M  U  V  VNORM  J  B  )  NU=NU»l 
IF  NU  <  N  ,  XINU<-II=B  FI 
A  B  =  U(M)  A 
A=B  I=J-1  00 

FMU=  CNU-N)  ♦C«-  (  B- Al  ♦VNORM 
X(N)=U(M) 

RET  END 
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FINDS 


S'JS  FINOS  (A  I  C  M  U  V  VNQRM  J  B  > - 

—  SOLVE  IS-A)*SUPNORM(y  OVER  INTERVAL  (A,Bn-C  FOR  8 
OPAtCtUltVIt  VNORM  ,  B  •  VNORML  »  VNORMR  . 

OP  S  t  T  ,  E  f  OSQRT  • 

a  A  >=  u( I )  a  a  A  <  u( i^ii  a 

j=i  +  i 

S={V(  I<-l)-V(  l )  I/IUIU-ll-Ut  I)  I 
VNOPML=vn  )*S*CA-U{I  )) 

—  FIND  ROUGH  LOCATION  OF  8 

DO  IF  VNORML  <  VCJ)  «  VNORMR=V(J)  ;  VNORMR=VNORML  FI 
i  VNORMR*(U(JI-AI  >  C  « 

IF  J  =  M  ,  8=U{M)  VNORM=VNQRMR  RET  FI 
J=J+l  VNORML=VNORMR  00 
~  FIND  PRECISE  LOCATION  OF  B 

—  TAKE  CARE  OF  TRIVIAL  CASE 

IF  VNORML  =  VNORMR  ,  B=A-»-C/VNQRML  VNORM=VNORML  RET  FI 

—  COMPUTE  TRANSITION  POINT 
S=IV{J )-V(J-l) I/IUIJI-UIJ-I) I 
T=U(JI-( VNDRMR-VNORML)/S 

—  TAKE  CARE  OF  OTHER  TRIVIAL  CASE 

IF  (T-A)*VNORHL  >=  C  t  B=A+C/VNORML  VNORM=VNORHL  RET  FI 

—  take  care  of  NONTRIVIAL  CASE 
E=VN0RMR-S»IU( J)-A) 

IF  E  >=  0.000  t  B=A+2.000*C/IE+0SQRT(E**2+4.000*S*C) I  ; 

B=A+(0SQRT(E»*2>4.000*S*C)-EI/( 2.000 *S)  FI 
VNaRM=V( J-l)+S*(B-U( J-II J 
RET  END 
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RELATIVISTIC  THERMODYNAMICS  OF  REAL  GASES 
WITH  BROKEN  INTERNAL  SYMMETRY 


Richard  A.  Weiss 

U.  S.  Army  Engineer  Waterways  Experiment  Station 
Vicksburg,  Mississippi  39180 


ABSTRACT.  The  relativistic  state  equation  for  real  gases  with  broken  In¬ 
ternal  symmetry  Is  developed.  This  Is  done  by  solving  the  complex  form  of  a 
relativistic  trace  equation  for  the  vlrlal  state  equation  of  the  real  gases. 

The  resulting  solution  affects  only  the  third  and  higher  vlrlal  coefficients. 
The  complex  relativistic  third  vlrlal  coefficient  Is  given  by  a  solution  of 
two  coupled  differential  equations.  An  approximate  solution  Is  found  which  Is 
valid  In  the  high  temperature  region,  and  an  expression  for  the  Internal  phase 
angle  for  the  third  vlrlal  coefficient  Is  obtained.  From  this  It  Is  possible 
to  develop  expressions  for  the  Internal  phase  angles  of  the  pressure,  Internal 
energy,  entropy,  enthalpy,  and  free  energy  of  real  gases  that  exhibit  broken 
Internal  symmetries.  Mixtures  of  Interacting  gases  with  broken  internal  sym¬ 
metry  are  suggested  to  exhibit  an  Interference  phenomenon  whereby  the  total 
pressure  and  Internal  energy  will  oscillate  slightly  In  magnitude  as  the  den¬ 
sity  of  the  system  is  increased.  Accurate  high  temperature  state  eqtoations 
of  real  gases  are  important  for  the  description  of  the  equilibritim  configura¬ 
tions  of  gaseous  stars,  and  for  the  description  of  nuclear  explosions  in  the 
atmosphere. 

1.  INTRODUCTION.  Spontaneously  broken  symmetry  is  a  common  phenomenon 
In  physics  because  It  appears  in  such  diverse  situations  as  ferromagnetism, 
superconductivity,  weak  Interactions,  and  the  vacuum  screening  currents  that 
produce  the  asymmetric  vacuum.  It  is  associated  with  a  phase  difference 
between  a  free  particle  In  a  potential  and  a  particle  in  a  coherent  state  of 
particles  which  forms  due  to  some  special  system  of  forces.  In  superconduc¬ 
tivity  it  is  the  Cooper  pairs  of  electrons  that  form  a  self-coherent  system 
which  violates  gauge  Invariance.  In  a  similar  fashion  the  vacuum  state  is 
thought  to  exhibit  spontaneous  symmetry  breaking  due  to  the  presence  of  a 
Higgs  scalar  field  which  has  a  nonzero  value  for  a  minimum  potential  energy. 

In  relativistic  thermodynamics  a  similar  broken  symmetry  has  been  suggest¬ 
ed  to  exist  in  the  renormalized  state  equations  of  solids  and  quantum  liquids. 
This  broken  symmetry  is  associated  with  Intrinsic  phase  angles  of  the  thermo¬ 
dynamic  state  functions  such  as  Internal  energy  and  pressure,  and  is  due  to  the 
Interaction  of  spacetime  with  bulk  matter  and  the  vacuum.  The  internal  phase 
angles  of  the  coherent  state  must  be  considered  when  applying  the  first  and  sec¬ 
ond  laws  of  thermodynamics  because  the  differentials  of  the  state  functions, 
such  as  the  entropy  and  Internal  energy,  must  include  a  rotation  in  Internal 
space. ^  This  affects  the  measured  state  equation  of  thermod3mamic  systems  such 
as  solids,  liquids  and  gases.  This  paper  considers  the  vacuum  Induced  broken 
symmetry  of  the  state  functions  of  the  real  gases.  The  broken  S3niimetry  appears 
in  the  third  and  higher  order  vlrlal  coefficients  of  the  state  equation  for 
real  gases. 
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The  effects  of  the  Minkowski  metric  of  spacetime  on  the  equation  of  state 
of  bulk  matter  was  originally  described  by  the  solutions  of  the  scalar  trace 
equation® 

“  ^  ^(PV)„  -  U"  +  (1) 

where  U  =  relativistic  (renormalized)  internal  energy,  P  *  relativistic  pres¬ 
sure,  T  *  absolute  temperature,  V  =  volume  of  substance,  and  U®  and  “  cor¬ 
responding  nonrelativist ic  internal  energy  and  pressure.  Throughout  this  paper 
the  index  "a"  will  refer  to  nonrelativistic  (xinrenormalized)  calculations.  It 
has  been  suggested  that  the  spacetime  Induced  broken  symmetry  effects  on  bulk 
matter  can  be  represented  by  the  following  complex  number  trace  equation® 


®  ^  -  5''  ^ 


(2) 


whose  solution  yields  complex  numbers,  with  internal  phase  angles,  for  the  state 
functions  of  relativistic  thermodynamics.  Complex  number  solutions  arise  only 
in  those  cases  for  which  there  are  nonzero  gauge  terms  oi  the  form®’® 


*  0  (4) 

This  includes  interacting  systems  and  the  general  case  of  the  noninteracting 
relativistic  Fermi  gas. 

The  unrenormalized  pressure  and  internal  energy  density  for  the  real  gases 
are  given  by^’® 


P^  =  nR^[l  +  nB^(T)  +  n^C®(T)  +  n^D®(T)  +  •••] 


(5) 


nR®T[-|  - 


_  3B  1  dU  1  0-  dU  n 

^  n  T  ,  n  T  —  *  *  *  J 


a 


i  n^T 
-  2  n  T 


3T 


-  3  n  T 


9D 

3T 


(6) 


where 


n  -  N/V  -  1/V  (7) 

where  N  ■  number  of  moles,  V  ■  molar  volume;  ,  B^(T)  ,  C^(T)  and  D^(T)  ■  non¬ 

relativistic  values  of  the  gas  constant,  second  virial  coefficient,  third  vi- 
rial  coefficient,  and  fourth  virial  coefficient  respectively.  The  correspond- 
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ing  renormalized  pressure  and  energy  density  that  are  obtained  from  the  solu¬ 
tions  of  the  scalar  trace  equation  (1)  are  written  as®’^ 


P  =  nRT[l  +  nB(T)  +  n^C(T)  +  n^D(T)  +  •••] 


F  =  nRTrl  M  i  3C  _  i  ^ 

E  nRlL  2  3X  2  ^  3T  3  ^  3T 


(8) 

(9) 


where 


R  =  R 

B(T)  =  B^(T) 

C(T)  =  C^(T)  -  3[b'‘(T)]''  In 


T 

B^(T) 

2/3  ^ 

1  B^(T) 

^R 

B^I,) 

”  T 
'■CR 

1  ““"CR> 

2/3 


(9A) 

(10) 

(11) 

(12) 


and  where  =  relativity  temperature  constant,  and  *  conjugate  relativity 
temperature  constant.  An  expression  for  the  renormalized  fourth  virial  coef¬ 
ficient  D(T)  has  not  been  obtained. 

The  solution  to  the  complex  number  trace  equation  (2)  for  the  real  gases 
is  written  as 

P  •  Pe-’®*’  -  iiRT[1  +  nB(T>  +  n^C(T)  + 

-E.EeJ«E.nRT[|-„T|i-i„^Tf 

where 

B  =  Be-^®B 

C  =  Ce^®0 
D  »  De^®l^ 

are  to  be  determined  from  a  solution  of  th 
are  obtained  in  this  paper,  and  in  fact  C  is  obtained  only  through  a  high  tem¬ 
perature  approximation.  It  is  the  real  parts  of  the  complex  number  virial  co¬ 
efficients,  pressure  and  internal  energy  that  are  the  measured  quantities. 


n^D(T)  +  •♦•] 


(13) 


1  3^  3D  T 

3  3T  ■’ 


(14) 


(15) 


(16) 


(16A) 

e  trace  equation  (2) .  Only  B  and  C 
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The  determination  of  P  and  0p  for  real  gases  with  broken  internal  symme¬ 
try  is  important  because  the  state  equation  of  real  gases  enters  the  physical 
description  of  such  diverse  situations  as  the  equilibrium  configuration  of 
stars  and  the  latent  heat  associated  with  the  gas-liquid  phase  transition. 
Consider  for  instance  the  Claus ius-Clapyron  equation  for  a  real  gas  with 
broken  internal  symmetry^ ^ 

i  =  T(V2-Vj)g  (17) 

where  Z  =  complex  number  latent  heat  of  vaporization,  V2  =  specific  volume  of 
vapor,  Vj^  =  specific  volume  of  liquid,  and  dP/dT  =  slope  of  the  vapor  pressure 
curve.  The  complex  number  latent  heat  of  vaporization  can  be  written  as 

Z  -  £e^®-^  (18) 

where  Z  =  magnitude  of  latent  heat,  and  ®  internal  phase  angle  of  the  latent 
heat  of  vaporization.  Equation  (17)  can  then  be  written  as  the  following  three 
equations 


Z  -  T(v2  -  vp  /(3P/3T)J  +  (P  30p/8T)J  (19) 


^  'p.T 

30p/3T 
^P,T  *  ^  3P/3T 

Note  that  if  0p  =  0  the  standard  form  of  the  Clausius-Clapyron  equation  is  re¬ 
gained.  The  measured  value  of  the  latent  heat  is  Z  cos  0^  . 

Accurate  state  equations  for  the  real  gases,  including  broken  symmetry 
effects,  are  important  for  stellar  structure  calculations  because  the  internal 
phase  angle  of  the  radial  coordinate  is  related  to  the  internal  phase  angle  of 
the  pressure.  Therefore  the  complex  values  of  the  third  and  higher  virial  co¬ 
efficients  of  the  real  gases  will  play  an  important  role  in  stellar  equilibri¬ 
um  calculations.  In  addition,  it  has  been  suggested  that  the  third  virial  co¬ 
efficient  of  the  real  gases  can  be  utilized  in  the  design  of  a  gravitational 
wave  detector .  ^ 

This  paper  calculates  the  internal  phase  angles  of  the  second  and  third 
virial  coefficients  of  the  real  gases.  The  phase  angle  of  the  second  virial 
coefficient  is  determined  to  be  equal  to  zero.  The  evaluation  of  the  internal 
phase  angle  of  the  third  virial  coefficient  is  more  complicated,  and  has  been 
determined  only  in  the  regions  of  high  temperature.  The  internal  phase  angles 
of  the  pressure,  internal  energy,  entropy,  enthalpy,  and  free  energy  are  then 


(20) 

(21) 
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calculated  in  terms  of  the  virial  coefficients  and  their  internal  phases.  The 
heat  capacity  for  a  real  gas  with  broken  internal  symmetry  is  then  evaluated. 
Finally,  a  general  discussion  of  the  interference  effects  expected  to  occur  in 
mixtures  of  asymmetric  gases  is  given. 

2.  THIRD  VIRIAL  COEFFICIENT  FOR  ASYMMETRIC  REAL  GASES.  This  section  uses 
the  complex  number  trace  equation  (2)  to  solve  for  the  renormalized  values  of 
the  second  and  third  virial  coefficients  of  the  real  gases  with  broken  internal 
symmetry.  It  has  been  shown  that  the  real  number  trace  equation  (1)  does  not 
change  the  value  of  the  second  virial  coefficient  as  shown  in  equation  (10).® 

In  a  similar  fashion  it  is  easy  to  show  by  substituting  equations  (13)  and  (14) 
into  the  complex  number  trace  equation  (2)  that  the  second  virial  coefficient 
satisfies  the  following  relation® 

1  dg  _  1  da^ 

g  dT  „a  dT  (22) 


where 


6  »  RTB 


=  RTB^ 


Equations  (22)  through  (24)  imply  that 


B  =  Be^  ®  =  Bj^  +  jB^  =  B^  =  real  number 
which  gives 


=  0  (26) 

Qg  =  0  (27) 

\  *  B®  (28) 

Therefore  the  relativsitic  value  of  the  second  virial  coefficient,  as  deter¬ 
mined  from  a  solution  of  equation  (2),  is  a  real  number  which  is  equal  to  the 
unrenormalized  value  of  the  second  virial  coefficient. 


The  calculation  of  the  complex  number  values  of  the  third  virial  coeffi¬ 
cients  follows  in  a  more  complicated  fashion  from  a  solution  of  equation  (2) 
for  the  real  gases.  An  expedient  way  of  doing  the  caluclatlon  is  to  make  use 
of  the  results  for  real  gases  that  have  been  obtained  for  the  relativistic  form 
of  the  third  virial  coefficient  from  a  solution  of  the  scalar  relativistic  trace 
equation  (1).®  First  define  a  function  f(T)  which  is  given  by 


(29) 


RTC(T)  «  RTC^(T)  +  f(T) 

then  substituting  equations  (13),  (14)  and  (29)  into  the  trace  equation  (2)  gives 
the  following  equation  for  f(T)® 


dT  ^  T  ^  ^ 

(30) 

where 

F-1-2X^ 
ga  dT 

(31) 

G  =  [R/cJ^(6^  -  Tde^/dT)  -  6^] 

RT 

(32) 

-  -  (1  +  2  ‘‘®% 

RT^  ^ 

where  Cy^  =  3R/2  .  Equations  (30)  through  (32)  are 
already  been  obtained  in  Reference  6  for  the  scalar 
tion  (1)  except  f(T)  is  a  complex  number. 

of  the  same  form  that  has 
relativistic  trace  equa- 

The  complex  number  f  is  written  as  follows 

£  -  -  £e^®f 

(33) 

f =  -  f  cos  9^ 

(34) 

f j  -  f  sin  9^ 

(35) 

with  f  >  0  ,  and  where  f  and  9c  are  to  be  determined.  Note  that  the  use  of  the 
function  f  is  different  from  that  in  Reference  6.  In  the  present  paper  f  is 
used  as  a  magnitude  of  a  complex  number  and  is  always  positive.  The  choice  of 
the  negative  sign  in  equation  (33)  is  made  so  that  9f  is  small  and  does  not  con 
tain  the  tt  associated  with  negative  real  and  imaginary  values  of  f  in  the  re¬ 
ions  of  high  temperature.  Placing  equation  (33)  into  equation  (30)  gives 


'dT  ^  dT  T  ^ 


-  G 


(36) 


Taking  the  real  and  imaginary  parts  of  equation  (36)  yields 
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(37) 


Hf  P- 

®f  ^dT  ®f  ^IF  =  ■  ^ 

H  f  V  f 

sin  e^  (dx  +  ®f  ^IF  “  ° 


(38) 


Equations  (37)  and  (38)  are  the  two  required  equations  for  determining  f  and 
9^  .  Equations  (37)  and  (38)  can  be  rewritten  as 


tan  9^ 


f  d9j/dT 


(39) 


+  f  f)^  +  (f  dGf/dT)^  =  (40) 

Equations  (37)  and  (38)  are  difficult  to  solve  without  some  approximations. 

In  this  paper  a  solution  of  equations  (37)  and  (38)  is  obtained  that  is  valid 
only  for  high  temperatures. 

An  approximate  solution  for  equations  (37)  and  (38)  can  be  obtained  by 
assuming  that  9^  is  small  so  that 


sin 

9f  9f  'v-  0 

(41) 

cos 

'V  1 

(42) 

f  - 

f 

o 

(43) 

Substituting  equations  (41)  through  (43)  into  equations  (37)  and  (38)  yields 


(44) 


d9 

-  ®f"-^^oFF 


(45) 


where  all  second  order  terms  of  9^  are  dropped, 
that  the  solution  to  equation  (44)  is® 


It  has  already  been  shown 


f 

o 


3RT(B  )  In 


(46) 
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so  that  from  equations  (33)  through  (35)  it  follows  that 


f  -  3RT(B^)^  In  e-^®^ 

(47) 

f  -  3RT(B®)^  In  cos 

(48) 

2 

f  j  -  3RT(B^)  In  sin  8^ 

(49) 

This  solution  is  valid  only  when  f^  >  0  since  in  fact  f  is  the  magnitude  of  a 
complex  number.  Therefore  the  assumption  that  0f  is  small  and  that  fg  >  0  re¬ 
stricts  the  validity  of  the  approximate  solution  to  the  regions  of  high  temper¬ 
ature.  For  this  case  fj^  <  0  and  f  j  $  0  .  Combining  equations  (29)  and  (47) 
gives  the  following  approximate  relationships  for  the  renormalized  third  virial 
coefficients 

C(T)  =  C^(T)  -  3(B^)^  Jbi  e^^f  (50) 

Cj^(T)  =  C^(T)  -  3(B®)^  In  cos  8^  (51) 

Cj(T)  -  -  3(B®)^  In  ip®  sin  (52) 

It  is  the  real  value  of  the  third  virial  coefficient  that  is  measured  from 
experimental  pressure  versus  volume  curves  at  constant  temperature. 

It  remains  only  to  obtain  9f  from  a  solution  of  equation  (45)  which  can  be 
rewritten  as 


de,/0,  =  G/f  dT 

It  o 

(53) 

ln\Q^\  -  /G/f^  dT 

(54) 

Combining  equations  (24) ,  (32)  and  (46)  gives 

G  -  -  R(B®)^  ^3  +  2  ^^) 

ga  dT 

(55) 

G/f  ~  +  Inip^ 

0  T  3ga  dT 

(56) 
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But  from  equation  (12)  it  follows  that 


d  In  =  i  .  _2_  ^ 

dT  “  T 

Combining  equations  (54) ,  (56)  and  (57)  gives 

'^1  ®f  1  “  ■  /  ^  ^AT  dT  =  -  ZnUn  +  Zn  h 


(57) 


(58) 


and  therefore 

0^  =  ±b/£n  (59) 

where  b  =  constant  whose  value  can  only  be  determined  from  the  full  solution  of 
equations  (37)  and  (38).  The  solution  in  equation  (59)  is  valid  only  when  Sf 
is  a  small  number,  and  in  general  this  limits  the  application  of  equation  (59) 
to  the  regions  of  high  temperature.  Note  that  equations  (37)  and  (38)  are  un¬ 
changed  for  0£  -*•  -  0£  ,  so  that  either  ±9f  are  valid  solutions,  and  therefore 
these  equations  exhibit  degeneracy. 

The  relativistic  third  virial  coefficient  can  be  obtained  from  equations 


(50)  through  (52)  as 

C  .  Ce^®C  -  +  jCj  (60) 

=  C  cos  8^  (61) 

Cj  =  C  sin  9^  (62) 

and  therefore 
tan  6^  » 

sin  9^  »  Cj/C  (64) 

cos  9^  -  Cj^/C  (65) 


where  and  are  given  by  equations  (51)  and  (52)  and  where  the  magnitude  of 
the  third  virial  coefficient  is  given  by 
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(66) 


2  2 


=  [C®  -  3(B®)^  In  +  6C^(B^)^  In  (1  -  cos  6^) 


when  a  single  phase  6^  of  the  gas  is  present.  At  high  temperatures  <  0  , 
and  either  <  0  for  positive  9^  ,  or  Cj  >  0  for  6^  <  0  .  For  <  0  it  fol¬ 
lows  from  equations  (63)  through  (65)  that  6^  =  tt  +  6^  and  6^  is  in  the  third 
quadrant «  while  for  C^.  >  0  it  follows  that  second 

quadrant.  Therefore  equations  (37)  and  (38)  have  degenerate  solutions,  and  the 
real  gases  can  appear  in  two  states  corresponding  to  ±6^  .  If  a  real  gas  is  in 

a  state  which  is  a  mixture  of  fraction  a  with  6^  >  0  and  fraction  (1  -  a)  with 

<  0  it  follows 


Cj.(T)  -  -3  (2a  -  1)  (B^)^  In  sin  je^l  (52A) 

Therefore  for  equal  mixtures  of  both  phases  “  0  and  9q  “  it  .  The  function 
Cj^(T)  is  determined  in  static  pressure  versus  volume  measurements  at  constant 
temperature,  and  is  not  affected  by  the  sign  of  the  phase  function  9^  .  Fi¬ 
nally  it  should  be  noted  that  the  fourth  virial  coefficient  can  be  written  as 

D  -  De^®D  (67) 

but  no  calculations  to  determine  D  and  9jj  have  been  done. 

3.  INTERNAL  PHASE  ANGLES  OF  THERMODYNAMIC  FUNCTIONS.  This  section  con¬ 
siders  the  calculation  of  the  pressure,  internal  energy,  entropy,  enthalpy,  and 
free  energy  of  real  gases  with  broken  internal  symmetry.  The  relativistic  pres¬ 
sure  and  internal  energy  density  for  a  broken  symmetry  real  gas  are  given  in 
equations  (13)  and  (14).  The  corresponding  expressions  for  the  entropy  density, 
enthalpy  density  and  free  energy  density  are  given  for  asymmetric  real  gases  as 
a  generalization  of  the  standard  results  in  the  literature.®  The  basic  thermo¬ 
dynamic  functions  for  a  broken  symmetry  real  gas  are  then  given  as  follows 

A.  Pressure. 


The  pressure  is  written  as 

P  .  »  nRT(l  +  nB  +  n^C  +  n^D  +  •••) 


(68) 


or  in  component  form 
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(69) 


2  3 

nRT(l  +  nB  +  n  C  cos  0  +  n  D  cos  0  +  •••) 

0  D 


2  3 

Pj  ■  nRT(n  C  sin  0^  +  n  D  sin  0^^  +  •••) 


tan  0 


n  (C  sin  0^  +  nD  sin  0^^  +  •••) 

P  2  3 

(1  +  nB  +  n  C  cos  0.  +  no  cos  0_  +  •••) 

C  D 


n  C  sin  0^  low  density 

The  magnitude  of  the  pressure  is  given  by 


(70) 


(71) 


(72) 


(73) 


It  is  the  real  part  of  the  pressure  that  is  measured  in  laboratory  experi¬ 
ments.  For  0^  >  0  and  0^  ■  ir  +  0^  it  follows  from  equations  (69)  through  (72) 

that  0p  <  0  in  the  regions  of  high  temperature,  but  for  the  case  6^  <  0  and 

0^  =  7r  -  0^  it  follows  that  0p  >  0  .  For  an  equal  mixture  of  states  with  0^  >  0 

and  0^  <  0  it  follows  that  0^  =  ir  and  0^-0  neglecting  the  effects  of  the  fourth 

and  higher  virlal  coefficients. 

B.  Internal  Energy  Density. 

The  energy  density  for  an  asyimnetric  real  gas  is  written  as 


E  -  Ee^®f  -  nRT(  i  -  nT  ^  -  i  n^T  ^  i  n^T 
t  fce  -  nRT^  2  -  nT  3^  -  2  n  T  ^  -  y  n  T  — 

Using  equations  (60)  and  (67)  in  equation  (74)  gives 


) 


I?  -  i  '®C  8c,I>  -  T  <=“  %  *  «D,I>  - 


1  _3, 


>]  (75) 


Ej  -  -  5ln  (e^  +  Sc_i)  +  j  nip  sin  (9p  +  6p_j)  +  •••] 


tan  -  Ej/Ej 


1  2 


-  -j  n  Ij,  sin  (0^  +  3^  ^)  low  density 


(76) 

(77) 

(78) 
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where 


=  /(T3c/aT)^  +  (CT  se^/ax)^ 
ij,  =  /(TaD/ai)^  +  (DTa6jj/aT)^ 

ae^/aT 
^c,T  ac/ai 

a9jj/3T 
^D.T  "  °  aD/ai 

The  magnitude  of  the  internal  energy  density  is  given  by 

E  -  /Er  +  4 

It  is  E  that  is  measured  in  the  laboratory 
R 

C.  Entropy. 

The  entropy  density  for  an  asymmetric  real  gas  is  written  as 
s  =  se^^s  =  -  nR[£n(nRT)  +  n(B  +  T  ||  )  +  |  n^(C  +  T  ||  ) 
+  I  n^(D  +  T  ||)  +  •••  ] 

Using  equations  (60)  and  (67)  in  equation  (84)  gives 

JiD  1  O 

=  -  nR[£n(nRT)  +  n(3  +  T  -^)  +  2  cos  (0^  +  a^^^) 
+  jn\  cos  (0D  +  cCjj,T^  +  *••] 


(79) 

(80) 

(81) 

(82) 

(83) 


(84) 


(85A) 


s^  -  -  n\[-j  sin  (0^  +  “c,T^  T  "*^0  “d,T^  ^ 


II 

0) 

M 

(A 

(86) 

.12,  =^"'*0  ^  ■"C.T* 

2  "  ‘^C  £H(nRT) 

low  density 

(87) 
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where 


=  /(2C  -  T3C/8T)^  +  (CT  96^/3T)^ 

"S  =  /  (3D  -  TaD/3T)^  +  (DT3ejj/3T)^ 

CT  30^/3T 
"  2C  -  T  3C/3T 

DT  36jj/3T 
^D  3D  -  T  3D/3T 

The  magnitude  of  the  enthalpy  density  is  given  by 


(98) 

(99) 

(100) 

(101) 


h  =  /h^  +  hj  (102) 

The  value  of  h^^  is  obtained  from  laboratory  measurements. 

E.  Free  Energy  Density. 

The  complex  number  free  energy  density  for  an  asymmetric  real  gas  is  given  by 

i  =  ae^®a  =  nRT[  |  +  £n(nRT)  +  nB  +  j  n^C  +  y  n^D  +  •••]  (103) 

The  real  and  imaginary  parts  of  equation  (103)  are 

a_  =  nRT[4  +  £n(nRT)  +  nB  +  4  n  C  cos  e„  +  4  n  D  cos  0^  +  •••  ]  (104) 

K  Z  Z  C  3  U 

a^  =  n^RT(-|-  C  sin  0^  +  y  nD  sin  8^^  +  ••  •  )  (105) 

tan  0  =  a  /a  (105A) 

Si  IK 

Y  ^^0  sin  QQ/Cy  +  -£M(nRT)3  low  density  (106) 

The  real  part  of  the  free  energy  density  is  a  physically  measurable  quantity. 

4.  HEAT  CAPACITY  AND  GRUNEISEN  PARAMETER.  The  calculation  of  the  heat 
capacity  and  Griineisen  parameter  for  a  real  gas  with  broken  internal  symmetries 
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where  nRT  >  1  and  where 


=  /(C  +  T3C/9T)^  +  (CT30^/eT)^  (88) 

Jjj  “  /(D  +  T3D/3T)^  +  (DT30jj/3T)^  (89) 

CT  30^/3T 
“C.T  “  C  +  T  3C/3T 

DT  30q/3T 
“d,T  “  D  +  T  3D/3T 

The  magnitude  of  the  entropy  per  unit  volume  is  given  by 

s  «  +  Sj  (92) 

The  real  quantity  is  the  entropy  density  measured  in  the  laboratory. 

D .  Enthalpy . 

The  enthalpy  density  of  an  asymmetric  real  gas  is  given  by 

h  -  he^®^  -  nRT[|  +  n(B  -  t||)  +  j  n^(2C  -  T  || )  (93) 

+  i  n^(3D  -  T  ||)  +  •••  ] 

Placing  equations  (60)  and  (67)  into  equation  (93)  gives 

-  nRT[|  +  n(B  -  T  ||)  +  I  n^K^  cos  (9^  -  n^,)  (94) 

+  y  n^Kjj  cos  (®D  -  Hp)  +  •••  ] 

h^  -  n^RT[y  sin  (0^  "  I  sin  (9^^  -  n^)  +  •••  ]  (95) 

tan  0j^  ® 

'V  y  sin  (0^  -  n^)  low  density  (97) 
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is  performed  in  this  section.  The  complex  number  molar  heat  capacity  is  ob¬ 
tained  from  equation  (79)  to  be® 


S  “  S®  ^  "  Sr  ^Sl  “  2  “  ”Sl  "  I  °^S2 


where 


r  =  ^  -h  2T 

Si  ^  g^2  +  8T 


S2  =  "'0^2T||=C^2^  +  jC^2^ 

Substituting  equation  (60)  into  equation  (110)  gives 


S2  ‘20“  ■  “O' 


cos 

<M 

CD 

-h 

cos 

So 

sin 

®20 

+ 

sin 

®10 

where  is  given  by  equation  (79)  and 


20 


/ 


^20 


90. 


L_  0(T^)^ 

3T^ 


'20 


2  ^  ar 


20 


®20  “  ®0  ^0,T 


^O.T  “  So^So 


6,.,  =•  9„  +  0.,  m 

10  0  0,T 


(108) 


(109) 


(110) 


(111) 

(112) 

(113) 


(114) 

(114A) 

(114B) 

(115) 

(116) 

(117) 
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where  given  by  equation  (81).  Then  it  follows  from  equation  (108)  that 


■'VR 


R(  2  2  " 


(118) 


■'VI 


1 

2  "  ^S2I 


(119) 


tan  0^^  = 


(120) 


1  2, 


■  3  "  ^^2C  ®2C  ®1C^ 


(121) 

where  equation  (121)  is  valid  for  low  densities. 

The  Grtlneisen  parameter  for  an  asymmetric  real  gas  can  be  written  as^ 


jdy 

Y  =  ve-^  I  = 


2  2- 

Y  =  Ye-"  ''  =  Yo  +  JY^  =  j  ( 1  +  nY,  +  n  Yo  +  •  •  •  ) 


1 


(122) 


where 


Yi  =  fi  4- 


Y2  =  f2  +  82  +  fi8i 


(123) 

(124) 


(125) 


^  _  2  ,  2  3  B  ,  _  3B  X 

Si  -3^^ 


3T 


(126) 


^  —  T  J.  C 

^  3T  ^ 


-  _  1  /-r^  °  ^  +  7T  )  4.  ^  (r^  ^  S 

S2  -  3^^  ~2  3T  ^  "■  9 


3T  3T 

From  equation  (122)  it  follows  that 


-+  2T-^)2 

2  3T  ' 


(128) 


-  f  (I  +  nvj  +  i.\2r  +  •••) 


2  2 


^1=3"  ^21 
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where 


^2R  “  ^2R  ®2R 


"^21  “  ^21  '''  ®2I 


11  \2 

3T  ' 


(131) 

(132) 

(133) 

(134) 

(135) 

(136) 


are  given  by  equations  (79)  and  (117)  respectively,  and  where  l2c  ^20 
are  given  by  equations  (114)  and  (115)  respectively. 

5.  SUPERPOSITION  OF  THERMODYNAMIC  FUNCTIONS.  Consider  a  mixture  of  two 
interacting  gases  with  broken  internal  symmetries.  There  will  be  interactions 
between  the  two  species  of  gas  as  well  as  self  interactions  within  each  species. 
Therefore  the  total  pressure  and  internal  energy  is  written  as 


^2R 

=  cos 

“C.T^ 

^21 

=  sin 

“C,T^ 

®2R 

"  I  ^^2C 

cos 

®2C  ^  2Ic 

cos 

®1C> 

®2I 

"  I  ^^2C 

sin 

®2C 

sin 

®1C^ 

where 

cLTld 

are 

given  by 

equations 

P  =  P  +  P  +  P 

12  ^12 

U  =  Ui  +  U2  +  0^2 


(137) 

(138) 


where  Pj^2  ^12  interspecies  interaction  pressure  and  internal  energy 
respectively.  For  asymmetric  real  gases  the  terms  in  equations  (137)  and  (138) 
are  written  as 


P  =  Pe-" 


P2  ^2^ 


P  =  P  e 
1  1 


P  =  P  e 
12  12 


(139) 

(140) 


U  =•  Ue 


U2  -  U2e 


Ui  =  U^e 


^12  =  ^12^ 


j®U12 


(141) 
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Equations  (137)  and  (138)  can  be  written  in  component  form  as  follows 


P  cos 

®P 

cos  0p^ 

^^2 

cos  9p2 
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®P12 

(143) 
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®P 
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®U12 
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+  ^2 
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“l2 
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®U12 

(14o) 

From  equations  (143)  through  (146)  it  follows  that 
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where 
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Therefore  mixtures  of  two  asymmetric  real  gases  should  exhibit  interference 
in  regard  to  the  component  pressures  and  internal  energies,  and  the  magnitude 


220 


of  the  total  pressure  and  internal  energy  should  exhibit  a  small  oscillation 
as  the  density  of  the  interacting  mixture  is  increased.  This  is  also  true 
of  the  measured  pressure  and  internal  energy  which  can  respectively  be  writ¬ 
ten  as  P  cos  0p  and  U  cos  0y  . 

6 .  CONCLUSIONS .  Real  gases  are  expected  to  exhibit  broken  internal  sym¬ 
metries  that  manifest  themselves  as  internal  phase  angles  associated  with  the 
thermodynamic  functions  such  as  pressure,  internal  energy  and  entropy.  These 
internal  phase  angles  arise  from  the  third  and  higher  virial  coefficients  and 
are  due  to  renormalization  effects  associated  with  the  interaction  of  matter 
with  spacetime  as  described  by  equation  (2).  The  ideal  gas  and  the  second 
virial  coefficient  of  an  interacting  gas  are  unaffected  by  spacetime  interac¬ 
tions.  The  phase  angle  associated  with  the  third  virial  coefficient  can  be 
determined  from  the  solution  of  two  simultaneous  first  order  differential 
equations.  These  equations  are  generally  not  easy  to  solve  analytically,  but 
yield  a  simple  solution  for  the  high  temperature  regions  of  the  real  gas  where 
the  phase  angle  of  the  third  virial  coefficient  is  small.  The  virial  form  of 
the  state  equation  for  real  gases  allows  the  phase  angles  associated  with  the 
pressure,  internal  energy,  entropy,  enthalpy,  and  free  energy  to  be  calculated 
in  terms  of  density  and  temperature.  The  existence  of  internal  phase  angles 
for  the  thermodynamic  state  functions  suggests  that  mixtures  of  real  gases  will 
produce  an  interference  phenomenon  wherein  the  total  pressure  and  internal  en¬ 
ergy  will  oscillate  slightly  as  the  density  of  the  system  is  increased. 

It  can  also  be  conjectured  that  parabolic  waves  of  the  forrn^^*^^ 


at  =  3-2 


=  f(ej  +  D, 


(155) 


(156) 


can  exist  in  asymmetric  gases  and  liquids  as  well  as  in  asymmetric  solids  and 
quantum  liquids,  and  these  wave  motions  may  have  interesting  applications  to 
thermodynamics,  hydrodynamics  and  chemical  and  biological  cycles.  The  internal 
phase  angles  of  the  pressure  and  other  thermodynamic  functions  of  the  real  gases 
are  also  expected  to  play  an  important  role  in  the  determination  of  the  equili¬ 
brium  configuration  of  stars  and  planets.  This  is  true  because  the  internal 
phase  angles  of  the  radial  coordinates  in  a  star  are  determined  by  the  internal 
phase  angle  of  the  pressure.  Therefore  the  complex  number  values  of  the  third 
and  higher  virial  coefficients  are  intimately  involved  in  stellar  equilibrium 
calculations . 
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GAUGE  THEORY  OF  ATOMIC  PROCESSES 


Richard  A.  Weiss 

U.  S.  Army  Engineer  Waterways  Experiment  Station 
Vicksburg,  Mississippi  39180 


ABSTRACT .  Atomic  particle  processes  that  occur  within  bulk  matter  or  the 
vacuum  are  expected  to  be  influenced  by  the  broken  symmetries  of  the  thermody¬ 
namic  ground  and  excited  states  of  these  systems.  Internal  phase  angles  are 
associated  with  the  space  and  time  coordinates  and  kinematic  and  dynamic  vari¬ 
ables  of  particles  and  radiation  in  bulk  matter  or  vacuiun  with  broken  internal 
symmetries.  A  broken  symmetry  photon  gas  in  bulk  matter  or  vacuum  is  consider¬ 
ed,  and  the  radiation  pressure  and  energy  density  is  calculated.  The  geometric 
angles  between  kinematic  variables  and  between  dynamic  variables  have  internal 
phase  angles.  This  affectj  the  description  of  the  photoelectric  effect,  Comp¬ 
ton  effect,  and  Coulomb  scattering  in  bulk  matter  and  the  vacuum.  Thomson, 
Compton,  Rutherford,  Mott,  Bhabha,  and  Miller  scattering  processes  in  broken 
symmetry  systems  are  Investigated.  The  Schrbdinger  and  Dirac  equations  are 
developed  for  a  particle  located  in  bulk  matter  or  vacuum  with  broken  internal 
symmetries.  This  work  will  have  applications  to  nuclear  explosions  and  the  in¬ 
teraction  of  directed  energy  beams  with  matter. 

1.  INTRODUCTION.  In  the  past  decade,  great  advances  were  made  in  the 
theory  of  the  elementary  forces  which  bind  the  universe.  These  advances  devel¬ 
oped  through  the  realization  that  gauge  theory  is  the  natural  framework  for  de¬ 
scribing  the  four  basic  Interactions  that  occur  in  nature.^”®  Gauge  theory  was 
first  formulated  many  years  ago  by  Hermann  Weyl,  but  only  recently  has  its  real 
importance  to  physics  been  understood.^  In  some  cases  when  gauge  symmetry  is 
broken  spontaneously  by  some  special  set  of  forces,  a  coherent  state  of  matter 
can  be  formed  as  in  the  case  of  superconductivity  where  the  Cooper  pairs  of 
electrons  break  the  ground  state  gauge  symmetry  through  electron-phonon  inter¬ 
actions  . 

It  has  been  suggested  that  vacuum  interactions  with  bulk  matter  may  pro¬ 
duce  a  coherent  ground  state  which  is  described  by  thermodynamic  functions  that 
possess  internal  phase  angles.^  This  coherent  broken  symmetry  ground  state  can 
possibly  influence  the  microscopic  processes  that  take  place  in  bulk  matter. 

The  effect  can  occur  in  two  ways;  first,  through  the  Euler  equations  by  which 
fluid  elements  are  expected  to  have  space  and  time  coordinates  and  kinematic 
and  dynamic  variables  that  have  internal  phase  angles.  Secondly,  a  microscopic 
gauge  interaction  between  material  particles  can  be  induced  by  Minkowski  space- 
time,  and  this  complex  number  gauge  interaction  will  impart  internal  phases  to 
the  space  and  time  coordinates  and  to  the  kinematic  and  dynamic  variables  of 
Individual  particles.  Therefore  individual  particles  in  bulk  matter  require 
complex  numbers  for  their  kinematic  and  dynamic  descriptions  and  for  their  co¬ 
ordinate  locations  in  space  and  time.  The  same  conclusions  are  valid  for  the 
vacuum  with  broken  internal  symmetries,  because  the  vacuum  can  be  considered  as 
a  special  simplified  case  of  bulk  matter. 


223 


The  coherent  state  of  bulk  matter  is  due  to  spacetime  interactions,  and 
these  have  been  described  by  a  bulk  matter  relativistic  trace  equation  whose 
scalar  form  for  symmetrical  bulk  matter  is® 


U 

s 


3’ 


(1) 


where  =  renormalized  internal  energy  for  symmetric  bulk  matter,  Pg  =  renor¬ 
malized  pressure  for  symmetric  bulk  matter,  T  *  absolute  temperature,  V  =  vol¬ 
ume  of  substance,  and  U®  and  =  corresponding  nonrelativistic  internal  energy 
and  pressure.  Throughout  this  paper  the  index  "a"  will  refer  to  nonrelativistic 
(unrenormalized)  calculations.  The  complex  number  form  of  the  relativistic 
trace  equation  that  describes  the  coherent  broken  symmetry  state  of  bulk  matter 
is  given  by® 


or  equivalently  as 

(1  -  b  +T^  -  bv|f)E  -  3(1  +  r  +  V^-  YT-i)P  (3) 

where 

=  (T  -  b^^  ^  +  1  -  b®)E^  (4) 

and  where  U,  E,  P,  y,  and  b  are  complex  number  representations  of  the  internal 
energy,  energy  density,  pressure,  and  the  gauge  parameters.®  With  their  right 
hand  sides  set  equal  to  zero,  equations  (2)  and  (3)  describe  the  broken  symmetry 
thermodynamic  ground  state  of  the  vacuum.  Therefore  the  broken  symmetry  thermo¬ 
dynamic  ground  state  of  the  vacuum  is  a  simpler  special  case  of  the  broken  sym¬ 
metry  state  of  bulk  matter. 

Due  to  the  spacetime  interactions  with  bulk  matter,  the  single  particle 
energy  must  contain  a  gauge  potential  that  produces  the  difference  between  U  and 
at  the  macroscopic  level.  Corresponding  to  equation  (1)  the  noninteracting 
single  particle  energy  is  given  by’ 


e 


free 

is 


/ 


2  2^  2  4  ^  „s 

c  p  +  m  c  +  V 

3  g 


(5) 


where  pg  “  single  particle  momentum  for  a  symmetrical  system,  c  *  light  speed, 
m  -  proper  mass,  and  V|  ■  scalar  gauge  potential  for  symmetrical  matter.  The 
gauge  potential  is  zero  when  PV  =  aU  where  a  -  constant,  and  U  *  U®  .®  When 
it  has  a  non-zero  value,  the  gauge  potential  breaks  the  Lorentz  symmetry  of  the 
system.®  The  condition  V|  j*  0  is  valid  for  the  general  case  of  a  noninteracting 
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zero  temperature  Fermi  gas  (for  which  PV  ^  aU),  except  for  the  low  density  non- 
relativistic  case  and  the  high  density  ultra-relativistic  case  for  which  PV  ■  aU 
and  V|  =  0  The  single  particle  energy  for  the  non-interacting  case  corre¬ 
sponding  to  equation  (2)  for  broken  symmetry  matter  is  given  by 


-free 
e . 

1 


'  2-2  ^  24  ^  - 

c  p  +  m  c  +  V 


V  =  V  e^®Vg  (7) 

g  g 

where  p  =  complex  number  single  particle  momentum,  and  Vg  =  complex  number  gauge 
potential.  In  a  similar  fashion,  Vg  =  0  when  PV  =  aU  .  It  is  just  the  deriv¬ 
ative  terms  in  equations  (1)  and  (2),  required  for  gauge  invariance,  which  pro¬ 
duce  the  spacetime  interaction  gauge  potentials  that  prevent  the  single  particle 
energy  and  momentum  from  being  four  vectors  in  equations  (5)  and  (6)  when  PV  ^  aU 
and  PV  ^  aU  respectively.  For  the  interacting  case,  the  single  particle  energy 
is  written  as 

s  111  2  4  „s  ,,s  XQV 

e.=ycp  +mc  +V+V  (8) 

i  ’  s  g  e 

corresponding  to  equation  (1)  for  a  symmetrical  system,  and  as 

1.  =  /cV+mV  +  V  +  V^  (9) 

i  ^  g  e 

corresponding  to  the  relativistic  trace  equation  (2)  for  a  system  with  broken 
internal  symmetry.  For  the  broken  symmetry  case  the  external  potential  is  writ¬ 
ten  as 


In  this  paper  the  "s"  refers  to  a  symmetrical  renormalized  system. 

The  gauge  potentials  V  or  Vg  are  determined  indirectly  from  the  solution 
of  the  trace  equations  (1)  or  (2)7  Consider  the  trace  equation  (2).  This  equa¬ 
tion  is  solved  to  determine  the  bulk  matter  internal  energy  U  in  terms  of  the 
unrenormalized  internal  energy  U®  .  The  unrenormalized  thermodynamic  functions 
are  determined  from  the  unrenormalized  partition  function  Z®  which  is  given  by®»^ 

*  Jne  dq  dp  (11) 

where  n  =•  degeneracy,  8  »  l/(kT),  q^  and  p^  =  conventional  generalized  coordi¬ 
nates  and  momenta  respectively,  and  the  unrenormalized  Hamiltonian  is  given  by 


H'^Cq^.Pa) 


/  2  2  2  4^  ,,a 

'c  p  +  m  c  +  V 
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where  =  unrenormalized  external  potential,  and 


8  .9 


(13) 


(14) 


The  trace  equation  (2)  is  then  used  to  determine  the  renormalized  complex  num¬ 
ber  internal  energy  and  pressure  U  and  P  respectively.  These  values  of  U  and 
P  are  then  used  to  determine  the  complex  number  renormalized  partition  function 
Z  from 


U  = 


a  -Cn  z\ 


(15) 


-  _  1  /  a  z\ 
e  \  av 

where 

Z  =  Jrie  ®^dq  dp~  =  Ze^^^ 


(16) 


(17) 


and  q  and  p  =  complex  number  generalized  coordinates  and  momenta  respectively, 
and  where  the  renormalized  complex  number  Hamiltonian  is  given  by 


H(q,p)  =  y c^p^  +  m^c^  +  V  + 
From  equations  (15),  (16),  and  (17) 


(18) 

e 

it  follows  that 

(19) 

(20) 

(21) 

(22) 
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Therefore  equations  (19)  through  (22)  can  be  used  to  determine  Z  and  for  a 
relativistic  system  with  broken  internal  symmetry.  From  a  knowledge  of  Z  , 
equations  (17)  and  (18)  are  then  inverted  to  determine  Vg  and  Vg  .  In  summary. 


Z^  ^  P^.U^  P,U  -V  Z  ^  V  ,V  (23) 

g  e 

Because  Vg  and  Vg  are  complex  numbers,  it  is  expected  that  the  coordinates  of 
space  and  time  must  also  be  complex  numbers.  Note  that  it  is  the  real  parts  of 
the  complex  number  quantities  such  as  coordinates,  momentum,  energy,  pressure, 
frequency,  angles  and  scattering  cross  sections  that  are  the  measured  quantities. 

Therefore,  macroscopic  local  gauge  invariance  suggests  the  existence  of  a 
symmetry  breaking  microscopic  gauge  potential.  Also,  the  macroscopic  broken 
symmetry  state  required  by  equation  (2)  suggests  that  the  space  and  time  coor¬ 
dinates  and  the  kinematic  and  dynamic  quantities  such  as  single  particle  veloc¬ 
ity,  acceleration,  and  force  should  be  represented  by  complex  numbers  that  in¬ 
clude  a  description  of  internal  phase  angles.  This  paper  indicates  the  effects 
of  microscopic  internal  phase  angles  on  the  photon  gas,  and  on  such  elementary 
atomic  processes  as  the  photoelectric  effect,  Compton  effect,  and  Coulomb  scat¬ 
tering.  The  forms  of  the  Dirac  and  Schrddinger  equations  for  particles  in  bulk 
matter  or  vacuum  with  broken  symmetry  are  developed. 

2.  BROKEN  SYMMETRY  PHOTON  GAS  IN  BULK  MATTER.  This  section  describes  a 
photon  gas  with  broken  internal  symmetry  interacting  with  bulk  matter  that  also 
has  internal  phase  angles.  The  spectral  energy  density  of  a  symmetrical  photon 
gas  is  given  by  Planck's  law  as  follows^ 


E 

vs 


A  ^  ^(v)  ^  ^  ^  g(v) 

hv/kT  ,  vs  va  va 
e  -  i 


(24) 


where 


A  = 


Sirhv^ 


(25) 


and  where  E^g  =  spectral  energy  density  of  radiation  in  symmetric  matter, 

^vs^  =  spectral  energy  density  in  the  symmetric  vacuum,  h  =  Planck's  constant, 
k  =  Boltzmann's  constant,  and  T  =  absolute  temperature.  For  this  case  the  total 
energy  density  is  given  by  the  Stefan-Boltzmann  law^°’^^ 


E  =  /E  dv  =  oT^  =  E^'"^  =  E  =  E^^^ 
rs  j  vs  rs  ra  ra 


(26) 


where  E^^  and  =  total  energy  density  for  a  symmetrical  photon  gas  in  sym¬ 

metric  matter  and  the  symmetric  vacuum  respectively,  and  a  =  Stefan-Boltzmann 
constant.  The  pressure  of  the  symmetrical  photon  gas  is  given  for  the  symmet¬ 
rical  vacuum  by 
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(27) 


=  i  =  i  E^'^^ 

vs  3  vs  va  3  va 


i  E<'')  -  i  <,!'* 


rs 


1  r(v) 

3  ra 


(28) 


(v)  (v) 

where  and  =  spectral  and  total  radiation  pressure  respectively  for 

the  symmetrical  photon  gas  in  symmetric  vacuum. 


In  order  to  write  Planck's  law  for  radiation  in  matter  or  the  vacuum  with 
broken  internal  symmetries,  a  complex  number  form  of  the  radiation  frequency  is 
adopted  and  written  as 


V  = 


ve 


v(cos  +  j  sin  0y) 


(29) 


where  v  =  complex  number  frequency,  v  =  magnitude  of  frequency,  and  0^  =  fre¬ 
quency  phase  angle.  For  radiation  in  the  asymmetrical  vacuum,  the  radiation 
frequency  is  written  as 


(30) 


where  0y^^  *  internal  phase  angle  of  the  photon  frequency  in  the  asymmetric  vac¬ 
uum.  Using  equations  (24)  and  (29),  the  complex  number  form  of  Planck's  law  is 
written  as 


E  -  ^ 

(31) 

V  "  hv/kT 

e 

-  1 

where 

r  STrhv^ 

,3 

(32) 

Placing  equation 

(29)  into  equation  (31)  yields 

^v  “  F 

jC)  =  E^ej®Ev 

(33) 

^  “  n 

V  D  If 

+ 

(34) 
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®Ev  = 


(35) 


where  A  is  given  by  equation  (.25)  and  where 


D  =  e^^  -  2  cos  y  e^  +  1 

B  =  cos  (30.^)  (cos  y  e^  -  1)  +  sin  (36^)  sin  y  e^ 

C  =  sin  (30^)  (cos  y  e^  -  1)  -  cos  (30.^)  sin  y  e^ 

hv  „ 

X  =  — r  COS  0 
kT  V 

hv  . 

y  =  sin  6 
^  kT  V 


(36) 

(37) 

(38) 

(39) 

(40) 


The  same  expressions  that  are  valid  for  radiation  in  matter  with  broken  internal 
symmetry  are  also  valid  for  radiation  in  the  vacuum  with  broken  internal  symme¬ 
try  if  the  substitution  0^  -►  0^^^^  is  made  in  equations  (36)  through  (40)  .  This 
gives  the  photon  energy  density  for  radiation  in  the  asymmetric  vacuum  as 


g(v)  ^ 

V  V 


(41) 


where 

V 


and 


are  obtained  from  equations  (34)  and  (35)  respectively. 


Matter  in  general  has  a  spectral  index  of  refraction,  but  it  does  not  enter 
into  the  photon  spectral  energy  density  as  given  by  the  Planck  function.^**  This 
is  because  the  Planck  function  is  universal  and  does  not  include  specific  prop¬ 
erties  of  matter.^"*  The  index  of  refraction  does  not  enter  the  spectral  energy 
calculation  whether  or  not  the  radiation  is  symmetric  or  not.  Later  in  this 
paper  it  will  be  shown  that  the  index  of  refraction  does  enter  the  calculation 
of  the  total  energy  density  of  photons  in  matter  due  to  an  increased  photon  den¬ 
sity  in  matter. 

The  spectral  radiation  pressure  associated  with  the  complex  spectral  radi¬ 
ation  energy  density  given  in  equation  (31)  is  obtained  by  a  generalization  of 
a  radiation  pressure  formula  given  in  the  literature  for  mechanical  radiation 
in  matter  as^^ 


P 

va 


V 


dn 


dy 


va 


dn 


E 

va 


(42) 
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where  = spectral  radiation  pressure,  n  =  particle  number  density  for  matter, 
and  =  speed  of  waves  of  frequency  v  ,  The  generalization  of  equation  (42)  to 
the  case  of  broken  symmetry  electromagnetic  radiation  is  given  by 


P 

V 


du. 


dn 


E 

V 


(43) 


where  P^  =  complex  number  spectral  radiation  pressure  in  matter,  and  =  com¬ 
plex  number  spectral  index  of  refraction  for  electromagnetic  waves  in  matter. 
For  the  vacuum  with  broken  internal  symmetry,  equation  (43)  reduces  to 


p(v)  ^  g(v) 
V  3  V 


(44) 


because  =  1  for  the  vacuum.  If  the  complex  number  spectral  index  of  refrac¬ 
tion  is  written  as 


p  =  p 
V  V 


then 


dp  dp  d9  io 

_n _ 1  =  Ji _ 1  +  in  _ -  H  J^VJV,n 

p  dn  p  dn  ^  J""  dn 

V  V 


and  then  equation  (43)  can  be  rewritten  as 


(45) 


P 

V 


E 

V 


H  e 

V 


j(9p  +e 

tv  pv. 


where 


and  where 


(47) 


(48) 


tan  8 


pv,n 


d0  /  dp 
pv  /  _n _ V 

dn  /  p  dn 

'  V 


(49) 


and  finally  where  9^^  is  given  by  equation  (35).  For  the  vacuum  p  =  1  and 
®pv  “  0  so  that  ^  ^ 
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(50) 


p(v)  ^  1  e(v)^^®Ev 
V  3  V 

where  6^^^  is  given  by  equation  (35)  with  the  substitution  0  ->  0^^^  .  For  the 

(v)  (v) 

symmetric  vacuum  0=0  and  0^  =  0  so  that 


=  1  e(v)  =  1  £  := 

vs  3  vs  3  vs  va 


(51) 


where  symmetric  vacuum  or  symmetric  matter  is  given  by  equa¬ 

tion  (24).  For  symmetric  radiation  in  symmetric  matter  equation  (43)  becomes 


where  P,ja  =  spectral  radiation  pressure  for  symmetric  matter.  Note  that  al¬ 
though  =  E  one  has  P  ^  P^^^  on  account  of  the  spectral  index  of  refrac- 

VS  vs  vs  vs 

tion  that  appears  in  equation  (52) . 

Writing  the  complex  spectral  radiation  pressure  for  asymmetric  radiation 
as 


P  =  P  e 

V  V 


j®Pv 


and  using  equation  (47)  gives 


P  cos  0_  =  E  4  cos  0E-  -  H  cos  (0r  +0  ) 

•"  Pv  Ev  yv,n 


-  ■  1 
:  -T  sii 

V  _  3 


P  sin  0_  =•  E  I  sin  0i:  -  H  sin  (0p  +0  ) 

V  Pv  v|  3  Ev  vi  Ev  yv,n  J 


Equations  (54)  and  (55)  give  for  the  asymmetric  vacuum 


^  ®Pv  3  %  ®Ev 


sin  e'’’  -  i  E<''>  sin 
V  Pv  3  V  Ev 


(53) 


(54) 

(55) 


(56) 

(57) 
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which  can  also  be  obtained  directly  from  equation  (50).  From  equations  (56) 
and  (57)  it  follows  for  the  asymmetric  vacuum  that 


p(v)  .  1  j(v) 
V  3  V 


e<''>  -  0<''> 

Pv  Ev 


(58) 

(59) 


A  comparison  of  equations  (51)  and  (58)  shows  that  this  equation  holds  for  both 
the  symmetric  and  asymmetric  vacuum.  From  equations  (54)  and  (55)  it  follows 
that 


tan  9 


cv 


Pv 


-5-  cos  9r  -  H  cos  (0^  +  6 

3  tv  V  cv  pv, 


and 


(60) 


P 

V 


=  E 


+  H  - 


H  cos  g 
V 


pv,n 


(61) 


When  6  =0  and  0  =0  equation  (61)  reduces  to  the  case  of  symmetric  radi- 

pv,n  pv  ^  ^ 

ation  in  symmetric  matter 


P 

vs 


=  E  (  4  -  H  ) 

vs'  3  vs 


where 


(62) 


H 

s 


n 


vs 


dp 


vs 

dn 


(63) 


for  symmetric  matter. 

(v) 

In  order  to  determine  the  phase  angles  0^  and  9^  in  terms  of  the  pressure 
internal  phase  angle  0p  that  is  associated  with  the  state  equation  of  bulk  mat¬ 
ter,  the  mechanical  equilibrium  of  matter  and  radiation  with  broken  internal  sym¬ 
metries  must  be  considered.  Consider  a  piece  of  matter  bathed  in  the  surround¬ 
ing  radiation  of  a  vacuum  with  broken  internal  symmetry.  Were  no  radiation  pre¬ 
sent,  the  matter  would  have  a  zero  pressure  P  =  0  because  the  situation  corre¬ 
sponds  to  a  minimum  value  of  the  binding  energy  at  the  equilibrium  density.  When 
radiation  is  present  inside  the  matter  and  outside  in  the  vacuum,  the  density  of 
matter  shifts  from  its  P  =  0  equilibrium  value  to  a  new  value  for  which  P  0  . 
The  new  equilibrium  density  depends  on  the  internal  and  external  radiation  den¬ 
sities  (pressures)  which  in  turn  depends  on  the  temperature  and  frequency  for 
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monochromatic  radiation,  or  solely  on  temperature  for  thermal  radiation.^®  The 
relationship  between  the  induced  matter  pressure  and  the  radiation  pressure  of 
a  monochromatic  radiation  field  will  now  be  developed  for  matter  and  radiation 
with  broken  internal  symmetries. 

For  matter  in  mechanical  equilibrium  with  internal  and  external  monochro¬ 
matic  radiation  fields,  the  equilibrium  condition  at  the  surface  boundary  is 


(64) 


“  ( v)  — 

where  =  spectral  radiation  pressure  in  vacuum,  =  spectral  radiation  pres¬ 

sure  in  matter,  and  P  =  complex  matter  mechanical  pressure  induced  by  the  radi¬ 
ation  fields.  In  component  form  equation  (64)  can  be  rewritten  as 


P  cos  0„  =  P^^^  cos  -  P  cos 

P  V  Pv  V  Pv 


P  sin  =  P^'^^  sin  0^^^  -  P  sin 

P  V  Pv  V  Pv 


Combining  equations  (65)  and  (66)  with  equations  (54)  through  (57)  gives 

P  cos  *  4  cos  -  E  4  cos  9r  -  H  cos  (0r:  +6  )  (67) 

°  ^  tv  V  tv  pv,n  J 


P  cos  *  4  cos  -  E  4  cos  9r  -  H  cos  (0r:  +6  ) 

P  3  V  tv  V  |_3  tv  V  tv  pv,n  _ 

P  sin  0„  =  4  f  sin  -  E  ,  4  sin  9^  -  H  sin  (0^+6  ) 

P  3  V  tv  ^  L3  tv  V  tv  pv,n  _ 

For  the  vacuum  the  following  conditions  have  been  used 


e<''>  -  0 


b'''>  .  0 

uv,n 


H<''>  .  0 


Note  that  for  the  asymmetric  vacuum  9  ,  '  ^  0  just  as  for  the  case  of  radiation 


tan  0p  = 


has  0^  ^ 

0  . 

From 

equations 

p'"’  sin 

V 

Pv 

-  P 

V 

sin 

«Pv 

cos 

V 

Pv 

-  P 

V 

cos 

«Pv 

P^  =  +  P^  -  2P  P^"^^  cos[5^''^  -  9r,  ] 

v-^  V  vv  '■Pv  Pv 


(v) 

For  the  case  9p^  =  0  and  9p^  =  0  equations  (65)  through  (71)  reduce  to  their 
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proper  scalar  forms,  for  instance  equation  (71)  gives  the  induced  pressure  in 
symmetrical  matter  as 


p  =  -  p  =4  -  (4  ■  ^ 

s  vs  vs  3  vs  3  vs  vs 

But  for  symmetrical  matter  and  symmetrical  vacuum  =  E^g  , 

(72)  can  be  written  as 


(72) 


so  that  equation 


P 

s 


=  H  E 
vs  vs 


(73) 


In  general  P  and  6p  are  functions  of  matter  density,  and  equations  (70)  and  (71) 
can  be  satisfied  only  if  the  equilibrium  density  of  matter  is  altered  by  the  ra¬ 
diation  fields.  Therefore  equilibrium  at  a  surface  requires  that  the  internal 
phase  of  matter  and  radiation  are  related  by  equations  (70)  and  (71). 

Now  the  total  (integrated)  energy  density  and  associated  pressure  needs  to 
be  determined  for  radiation  in  matter  and  the  vacuum  with  broken  internal  sym¬ 
metries.  The  integrated  radiation  energy  density  is  obtained  from  equations 
(31)  through  (40)  by  the  following  integral 


E^  =  E^e^  /  E^  dv  *  J  E^  J 1  +  (v  de^/dv)^  e 


T  ^^®Ev  ®v  ^v,v^ 


dv  (74) 


where 


d0 


tan  B  =  V 


V ,  V  dv 


(75) 


and  where  the  following  result  was  used 


dv  =  e^®^  (dv  +  jvde^)  =  /  1  +  (v  de^/dv)^  dv  (76) 

Therefore  the  gauge  rotated  frequency  must  be  used  to  evaluate  the  integrated 
radiation  energy  density.  The  radiation  energy  density  has  the  following  real 
and  imaginary  parts 


eJ  =  ®Er  “  ^v  /l  +  (vde^/dv)^  cos  (9^^  +  %  + 


oo  - - — 

E^  =  E^  sin  =  /  E^  /l  +  (vde^ydv)  sin  (0^^  +  %  + 


(77) 


(78) 
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The  magnitude  and  phase  angle  of  the  radiation  energy  is  given  in  terms  of 
these  integrals  as  follows 


^r  “  V 

tan 

The  evaluation  of  the- integrals  in  equations  (74),  (77),  and  (78)  is  not  simple 
due  to  the  complicated  form  of  the  spectral  energy  density  given  by  equations 
(34)  and  (35).  It  follows  from  equations  (34),  (35),  (77),  and  (78)  that  the 
energy  density  for  asymmetric  radiation  does  not  have  the  T^  temperature  be¬ 
haviour  that  is  valid  for  symmetric  radiation  according  to  equation  (26) .  The 
radiation  pressure  is  given  by 


P 

r 


+  Ap 

r 


(81) 


where  Ap^,  =  small  difference  in  radiation  pressure  due  to  internal  phase  angles, 
and  considering  only  the  complex  number  values  of  the  Planck  function.  Material 
properties,  such  as  the  index  of  refraction,  do  not  enter  the  Planck  analysis. 
Later  in  this  paper  it  will  be  shown  that  the  index  of  refraction  enters  the  ex¬ 
pressions  for  radiation  energy  density  and  pressure  due  to  an  increased  photon 
density  in  matter.  For  the  asymmetric  vacuum  the  integrated  radiation  density 
is  given  by 


/  1  +  (vde^^Vdv)^  e 


r 


[9(v)+9(v)^g(v)^ 


V ,  V 


dv 


(82) 


-iQ(v) 

r  r  r 


tan 

Er  r  r 


(83) 


which  is  formally  identical  in  structure  to  equations  (74)  through  (80)  for  asym¬ 
metric  radiation  in  matter.  Asymmetric  radiation  in  the  vacuum  also  does  not 
have  a  T^  dependence.  The  radiation  pressure  for  the  asymmetric  vacuum  is  given 
by 


p(v)  ^  J_  g(v)  ^  ^-(v) 
r  3  r  r 


("34) 
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—  (  v) 

where  = 

phase  angles. 


small  difference  in  vacuum  radiation  pressure  due  to  internal 


Considering  the  effects  of  increased  photon  number  density  due  to  the  in¬ 
dex  of  refraction  the  energy  density  and  pressure  of  symmetrical  thermal  radi¬ 
ation  in  symmetrical  matter  is  given  by^® 


E  = 

rMs  s 


(85) 


P 


rMs 


(i 


(86) 


where  E^j^g  and  =  measured  radiation  energy  density  and  pressure  for  symme¬ 

trical  matter  and  radiation,  and  where  Ps  =  density  dependent  index  of  refrac¬ 
tion  averaged  over  frequency.  The  term  p|  arises  from  the  general  thermodynam¬ 
ic  relations  between  pressure  and  energy  denstiy.^®  A  comparison  of  equations 
(26),  (28),  (85),  and  (86)  shows  that  E^-j^g  7^  E^g  and  Pj-yjg  ^  Pj-s  •  The  expres¬ 
sions  in  (26)  and  (28)  are  totally  independent  of  any  reference  to  material  para¬ 
meters  (such  as  Pg)  and  are  the  results  of  local  thermodynamic  equilibrium.^^  This 
is  why  the  T'^  law  is  universal  in  the  sense  that  it  applies  to  all  symmetric  ther¬ 
mal  radiation  in  symmetric  matter  or  vacuum.  The  presence  of  the  p|  term  in  equa¬ 
tion  (85)  represents  a  diffusion  effect  where  the  photon  number  density  is  in¬ 
creased  due,' to  their  slower  speed  in  matter  as  compared  to  the  vacuum.  ®  The 
important  point  is  that  the  p|  term  (or  any  other  dependence  on  material  proper¬ 
ties)  does  not  originate  from  the  Planck  distribution.  The  subscript  M  (for  mea¬ 
sured  value)  is  added  to  all  expressions  that  include  a  p|  dependence. 


In  analogy  to  equations  (85)  and  (86)  for  symmetric  thermal  radiation  in  sym¬ 
metric  matter,  the  measured  radiation  energy  density  and  pressure  for  an  asymmet¬ 
ric  system  is  written  as 


^  ®ErM 


(87) 


^rM  ^rM 


/ 1  _  2:  =  P 

\  3  ^  dn  /  rM 


(88) 


where  Ej.  =  energy  density  for  asymmetric  radiation  in  matter  as  calculated  from 
the  complex  number  Planck  function  given  in  equation  (74)  ,  Ej.j.j  =  measured  ther¬ 
mal  radiation  energy  density  in  an  asymmetric  system,  p  =  density  dependent  com¬ 
plex  number  index  of  refraction  for  asymmetric  matter  and  averaged  over  frequen¬ 
cy,  and  where  W(p)  =  yet  to  be  determined  function  of  the  complex  number  refrac¬ 
tion  index.  The  density  dependent,  frequency  averaged,  index  of  refraction  for 
asymmetric  matter  is  written  as 


P 


pe 


(89) 
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Note  that  equation  (88)  is  already  an  approximation  because  the  factor  1/3  holds 
only  for  symmetric  radiation  as  shovm  in  equation  (81).  For  asymmetric  matter 
and  radiation  the  integrals  in  equations  £77)  and  (78)  have  not  been  evaluated 
due  to  their  complexity  and  so  values  of  in  equation  (87)_have  not  been  found, 

but  it  is  clear  from  equation  (74)  that  the  leading  term  of  is  the  scalar  oT'^ 
corresponding  to  symmetric  radiation 

E^  =  (90) 

where  aE^  is  small  if  6^  is  small.  From  equations  (81)  and  (90)  it  follows  that 


-  1  4  - 

Pr  =  3  oT  +  —  + 


(90A) 


_  The  determination  of  the  exact  value  of  W(vi)  is  not  possible  since  the  value 
of  E^  has  not  been  determined.  However,  an  approximate  value  of  the  factor  W(u) 
can  be  determined  by  using  the  first  order  term  in  equation  (90) .  This  is  done 
by  first  considering  the  Gibbs-Helmholtz  relation  (also  called  Maxwell's  relation) 
applied  to  E  „  and  P  „  as  follows^ ^ 


ti'i  .  c  o,  rn  - 

■  ^  f  =  1  — rr; - P 

3n  rM  3T  rM 

Combining  equations  (87)  and  (88)  with  equation  (91)  yields 


-n  ^  (WE  )  +  WE  =  (i  -  3  ^)Ct  ~  (WE  )  -  WE  1  (92) 

dn  r  r  j  y  dn  Ji  r  r 

Then  assuming  W  is  a  function  of  density  alone,  and  is  given  by  the  scalar 
term  in  equation  (90)  so  that  E^  oT*^  ,  it  follows  from  equation  (92)  that 

)  (93) 


n_  ^  3  H  Aii 
W  dn  vJ  dn 


from  which 


i't  n  -3 

W  p 


From  equations  (87),  (88),  and  (95)  the  following  approximations  are  obtained 
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(96) 


r 


-  -3r  /I  n  dp  \ 

rM  r  3  y  dn 

From  equations  (87),  (89),  and  (96)  it  follows  that 


(97) 


(98) 


®ErM  " 


+  36 

u 


(99) 


where  Ej.  and  are  given  by  equations(79)  and  (80)  respectively.  All  further 
analysis  based  on  equations  (96)  and  (97)  is  limited  to  the  same  approximations 
that  went  into  the  derivation  of  equations  (96)  and  (97)  namely,  that  all  asym¬ 
metries  are  small. 


The  detailed  calculation  of  the  radiation  pressure  proceeds  from  equation 
(97) .  From  equation  (89)  it  follows  that 


£  ^  (100) 

y  dn 


where 


(101) 


tan 


S 


y  ,n 


£  ^ 
y  dn 


(102) 


Then  the  measured  thermal  radiation  pressure  in  asymmetric  bulk,  matter  is  ob¬ 
tained  from  equations  (97)  through  (99)  as  the  following  approximation 


-  3c  r  1  j®ErM  j  (*^ErM ,n^ -i 

The  component  forms  of  equation  (103)  are 


P  cos  0„  “  P  -  CO 

rM  PrM  r  ^  3 


P  ^  sin  0„  „  =  y^E  sin  0^  -  H  sin  (9e_w  +  3  )  ] 

rM  PrM  r  3  crM  crM  y,n 


^E_  [4  cos  9^^^  -  H  cos  (0^^^  +  8^^^)  ] 


(103) 


(lOA) 

(105) 
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From  equations  (lOA)  and  (105)  it  follows  that 


^  sin  0p  -  H  sin  (0^  w  +  B 
„  3  crM  trM  ii,n 

tan  6  =  - - 

4  cos  0r  vt  -  H  COS  (6c  „  +  B 
3  trM  crM  p,n 


(106) 


For  the  case  B,,  _  =  0  and  6,,  =  0  for  symmetric  radiation,  equation  (107)  becomes 

M  j  II  M 

(108) 

which  is  just  equation  (86)  because 


P  1.4  =  b  -  V  T 

rMs  s  rs  3 


-  H J 
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and  is  given  by  equation  (26) . 


For  the  vacuum  y  =  1  ,  0^  =  0  ,  By  jj 
(98)  and  (99)  it  follows  that  * 
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=  0  ,  and  H  =  0  ,  so  that  from  equations 
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For  radiation  in  the  asymmetric  vacuum  it  follows  from  equations  (104)  and  (105) 
that  the  following  approximate  equations  are  valid 


p'''>  cos  6<''>  .  4  £<''>  cos 
r  Pr  3  r  Er 

P<''>  sin  e'''>  -  4  £<''>  sin  e^’'* 
r  Pr  3  r  Er 


(112) 


(113) 


For  the  vacuum  it  follows  from  equations  (112)  and  (113)  that  approximately 


Pr 

-  ®Er 

(114) 

p(v) 

r 

II 

LOjh- 

m 

< 

(115) 
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(v)  r-(v) 

where  6^^  and  are  obtained  from  the  evaluation  of  the  integral  in  equation 

(82). 


Consider  now  the  equilibrium  equations  at  the  surface  of  asymmetric  matter 
that  is  bathed  in  asymmetric  thermal  radiation  of  the  vacuum.  The  condition  for 
mechanical  equilibrium  at  the  surface  of  the  body  is  that  the  induced  mechanical 
pressure  is 


P  =  -  P 


(116) 


or  equivalently 


P  cos  9„  =  P^  cos  6^  ~  P  «  cos  9„  „ 

P  r  Pr  rM  PrM 


(v)  (v) 

P  sin  9  =  P^  sin  9^  ~  P  m  m 

P  r  Pr  rM  PrM 


(117) 

(118) 


Combining  equations  (117),  (118)  and  equations  (104),  (105),  (112),  and  (113) 
gives  the  following  approximations 


P  cos  9„  =  4  cos  9c'^^  -  y^E  4  cos  9^  „  “  H  cos  (0^  w  +  8  ) 

P  sin  9_  =  4  sin  -  y^E  4  sin  9p  -  H  sin  (9p  w  +  8  ) 

P  3  r  Er  r  3  ErM  ErM  y,n 

From  equations  (117)  and  (118)  it  follows  that 

p2  =  +  P^  -  2P^''^P  cos  [0^^^  -  9  ] 

r  rM  r  rM  Pr  PrM 


(119) 

(120) 


(121) 


®P  =  7^ 


sin  9^''^ 

-  P  w 

sin 

Pr 

rM 

V, 

PrM 

cos  8^^^ 

-  P  w 

cos 

Pr 

rM 

PrM 

(122) 


For  the  case  of  symmetric  matter  and  symmetric  radiation  in  matter  and  the 
vacuum,  equation  (121)  becomes^” 


P  =  P^^^  -  P 
s  rs  rMs 


(123) 


.p(v)  .^3^  (1  H  ) 

rs  s  rs  3  s 

=  -  3u^  (4  -  H  )]  =  P^^^(l  -  y^  +  3y^H  ) 

rs  s3  s  rs  s  ss 
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where  P^g  is  given  by  equation  (28).  Equation  (123)  can  also  be  obtained  di- 
ectly  from  equation  (119). 

The  functions  0^  and  have  not  been  obtained  explicitly,  and  therefore 
the  radiation  energy  Ej.  is  also  unknown.  For  the  purposes  of  the  rest  of  this 
paper  it  is  sufficient  to  understand  that  the  frequency  of  photons  in  asymme¬ 
tric  bulk  matter  or  vacuum  has  an  internal  phase  angle  that  will  manifest  itself 
in  the  interactions  of  photons  with  other  atomic  particles. 

3.  BROKEN  SYMMETRY  OF  ANGLES  IN  ASYMMETRIC  BULK  MATTER  AND  VACUUM.  Within 
asymmetric  bulk  matter  or  the  asymmetric  vacuum,  the  internal  phase  angles  of 
the  coordinates  produce  a  broken  symmetry  in  various  geometrical  quantities  such 
as,  for  example,  angles.  These  broken  symmetry  angles  enter  the  basic  calcula¬ 
tions  of  atomic  processes  that  are  treated  in  this  paper.  Consider  first  the 
fact  that  angles  have  internal  phases,  a  result  which  can  be  deduced  from  the 
law  of  cosines  which,  for  the  complex  number  lengths  that  appear  in  asymmetric 
bulk  matter  or  vacuum,  can  be  written  as 

-2  ,  r2  -2 

T  a  +  b  -  c 

cos  ij)  =  - = -  (124) 

2ab 

where  a  ,  b  ,  and  c  are  the  sides  of  a  plane  triangle  and  |  is  the  angle  opposite 
side  c  .  Therefore  it  is  clear  that  i  and  cos  ^  are  complex  numbers  and  can  be 
written  as 


I’  = 


(125) 


—  — i  0 

cos  (p  =  C.e 
9 

where  =  magnitude  of  cos  ^  ,  and  =  phase  angle  associated  with  cos  i  . 

It  also  follows  that 

sin  i  (127) 

9 

where  =  magnitude  of  sin  9  »  and  6s(j)  *  phase  angle  of  sin  9  •  From  the  follow¬ 
ing  expression 

cos  9  =  (e^^  +  e  ^^)  (128) 

it  follows  by  elementary  algebra  that 

cos  9  *■  cos  9tj  cosh  9t  ~  j  sin  9d  sinh  9t  (129) 

R  i.  R  1 


where 
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These  results  will  be  used  In  Sections  5  and  6  where  particle  scattering  In  asym¬ 
metric  bulk  matter  or  vacuum  Is  considered.  The  measured  angle  Is  given  by 
=  ((>  cos  6^  *  where  <fa  “  conventional  angle  between  two  lines. 

4.  PHOTOELECTRIC  EFFECT  IN  ASYMMETRIC  BULK  MATTER  OR  VACUUM.  A  very  simple 
atomic  process  Is  the  photoelectric  effect  wherein  a  photon  collides  with  an  elec¬ 
tron  that  Is  bound  In  matter.  If  the  photon  has  sufficient  energy  It  will  over¬ 
come  the  binding  energy  of  the  electron,  and  the  electron  will  leave  Its  site  In 
the  matter  lattice  with  an  excess  kinetic  energy. The  description  of  the 
process  that  occurs  In  bulk  matter  or  vacuum  with  broken  symmetries  Is  similar 
to  that  of  the  standard  analysis  for  the  case  where  the  electrons  and  photons 
move  In  symmetrical  bulk  matter  or  vacuum,  except  that  now  the  kinematic  vari¬ 
ables  for  the  photons  and  electrons  become  complex  numbers.  This  Is  related  to 
the  broken  symmetry  of  space  and  time  In  bulk  matter  and  the  vacuum. 

Within  asymmetric  bulk  matter  the  binding  energy  of  an  electron  Is  described 
by  a  complex  number  potential  W  ,  so  that  the  binding  energy  Is  eW  ,  where  e  =  elec¬ 
tric  charge.  The  conservation  of  energy  then  requires 

Y  mv^  =  hv  -  eW  (137) 
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where  m  =  electron  mass,  v  =  complex  number  electron  velocity,  and  as  before 
V  =  complex  number  frequency  of  the  photon.  Within  asymmetric  bulk  matter  or 
vacuum  the  electron  velocity  has  a  broken  internal  symmetry  and  is  written  as 

V  =  ve^®v  (138) 

where  v  and  6^  =  magnitude  and  internal  phase  angle  respectively  of  the  electron 
velocity.  The  complex  number  binding  potential  is  written  as 

W  =  We^®W  (139) 

where  W  and  9y  =  magnitude  and  internal  phase  angle  of  the  binding  potential. 

As  described  in  Section  2  the  photon  frequency  is  a  complex  number  for  a  photon 
propagating  in  asymmetric  bulk  matter  or  vacuum,  and  is  written  as  in  equation 
(29) .  It  is  assumed  that  v  ,  0y  ,  W  ,  and  0y  are  known  quantities  and  that  equa¬ 
tion  (137)  can  be  used  to  determine  the  unknown  complex  number  speed  of  the 
ejected  electron. 

The  two  scalar  equations  equivalent  to  equation  (137)  are 


2 


2 

mv 


cos  (20  ) 
V 


=  hv  cos  0  -  eW  cos  0,, 

V  W 


(140) 


1  2 
Y  mv 


sin  (20^) 


=  hv  sin  0 

V 


-  eW  sin  0„ 

w 


(141) 


These  two  equations  can  be  used  to  determine  the  unknown  quantities  v  and  0^ 
as  follows 


tan  (20^) 


1  2  4 

7-  m  V  = 


hv  sin  0  -  eW  sin  0,, 

_ V _ W 

hv  cos  0  -  eW  cos  0,, 

V  W 


h^v^  +  e^W^  -  2hveW  cos  (0 


(142) 

(143) 


A  plot  of  the  kinetic  energy  of  the  electron  versus  frequency  is  shown  in 
Figure  1,  while  a  plot  of  the  internal  phase  angle  of  the  electron  kinetic  en¬ 
ergy  20y  versus  frequency  is  shown  in  Figure  2.  These  two  figures  show  that 
chere  is  a  discontinuity  in  the  kinetic  energy  magnitude  and  phase  angle  at  a 
threshold  frequency  which  is  obtained  from  equation  (140)  by  taking  20^  =  tt/2 
or 


hv  cos  9  ^  =  eW  cos  0,, 
t  vt  W 


(144) 


where  v^  =  threshold  frequency,  and  9^^  =  internal  phase  angle  of  the  frequency 
at  the  threshold  frequency.  The  electron  kinetic  energy  at  the  threshold  fre¬ 
quency  is  obtained  from  equation  (141)  to  be 
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(1A5) 


2 


2 


mv 


hv. 


sin  0 

vt 


-  eW  sin  6,. 

w 


=  eW  (cos  0.,  tan  0  -  sin  0,,) 

W  vt 


The  threshold  kinetic  energy  given  by  equation  (145)  is  the  minimum  kinetic 
energy  that  the  ejected  electron  can  have  in  asymmetric  bulk  matter  or  vacuum. 
Below  the  threshold  frequency  the  photoelectric  process  will  not  occur.  If  all 
phase  angles  are  set  equal  to  zero,  the  standard  results  are  regained  that  the 
threshold  frequency  is  given  by  hv^  =  eW  ,  and  the  minimum  kinetic  energy  of  the 
ejected  electron  is  zero.  Note  that  the  measured  electron  kinetic  energy  is 
given  by  equation  (140)  which  is  linear  in  the  photon  frequency  v  .  The  mea¬ 
sured  frequency  is  equal  to  v  cos  0^  . 

5.  THOMSON  SCATTERING  AND  THE  COMPTON  EFFECT  IN  ASYMMETRIC  BULK  MATTER 
AND  VACUUM.  The  elastic  scattering  of  photons  by  electrons  is  called  Thomson 
scattering.  For  this  case  the  photon  energy  is  much  smaller  than  the  electron 
mass  energy  mc^  and,  if  the  electron  is  bound  in  an  atom,  the  photon  energy  is 
larger  than  the  binding  energy  so  that  the  electrons  can  be  considered  to  be 
free.  For  this  case  the  differential  cross  section  is  given  by^^ 


2 

(1  +  COS  (J)^) 


(146) 


where  r^  *  classical  electron  radius,  and  where  “  conventional  scattering 
angle.  The  corresponding  differential  cross  section  for  Thomson  scattering 
of  photons  by  electrons  in  asymmetric  bulk  matter  or  vacuum  is  given  by 

^2 

I(i)  =  Ie^®I  =  -^(1  +  cos^  ^)  (147) 


Combining  equations  (126)  and  (147)  gives 


lcos0^=-f[l+c;  cos  (20^^)] 


r  2 

I  sin  0_  =  sin  (20  ) 

I  ^9  c9 


or 


tan  0, 


.2  ^ 


1  +  cj  cos  (2e_.^) 


I^  =  --[1  +  +  2c^  cos  (20  .)] 

4  9  9  c9 


(148) 


(149) 


(150) 


(151) 
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The  Compton  effect  is  the  name  associated  with  the  quantum  scattering  of 
photons  by  electrons  with  a  transfer  of  momentum  and  energy  from  the  photons 
to  the  electrons.  The  description  of  this  process  using  quanta  of  light 

was  one  of  the  early  successes  of  quantum  theory.  This  process  is  convention¬ 
ally  described  by  assuming  that  the  photon  and  electron  propagate  in  the  sym— 
®®tric  vacuum,  and  applying  the  laws  of  conservation  of  energy  and  momentum  to 
the  colliding  particles.  When  this  process  occurs  within  asymmetric  bulk  mat¬ 
ter  or  vacuum,  the  same  conservation  laws  are  expected  to  be  valid  except  now 
the  kinematical  parameters  of  the  photon  and  the  electron  have  broken  symme¬ 
tries  and  are  represented  by  complex  numbers. 

Within  bulk  matter  or  vacuum  with  broken  internal  symmetries,  a  photon 
of  initial  frequency  v  collides  with_a  stationary  electron,  and  a  new  photon  of 
frequency  v'  is  emitted  at  an  angle  (j)  with  respect  to  the  initial  photon  direc¬ 
tion,  and  the  electron  recoils  with  speed  v  in  a  direction  ip  with  respect  to  the 
initial  photon  direction.  Then  the  conservation  of  energy  for  the  nonrelativis- 
tic  case  gives 

hv  =  y  mv^  +  hv'  (152) 

while  the  conservation  of  momentum  yields  two  equations^ ^ 

hv  hv'  T  ,  -  7 

—  =  — —  cos  4)  +  mv  cos  >p  (153) 


— —  sin  (J)  =  mv  sin  ip  (154) 

These  equations  can  be  used  to  determine  the  three  unknown  complex  number  quan¬ 
tities  v'  ,  V  ,  and  $  in  terms  of  the  known  quantities  V  and  ip  .  These  three  con¬ 
servation  equations  are  expressed  in  terms  of  complex  numbers  and  are  therefore 
equivalent  to  six  scalar  equations.  The  two  components  of  the  nonrelativistic 
energy  conservation  equation  (152)  are 

1  2 

hv  cos  0  =  mv  cos  (26  )  +  hv'  cos  9'  (155) 

V  2  V  V 

hv  sin  6  =4  mv^  sin  (29  )  +  hv '  sin  9'  (156) 

V  2  V  V 

The  four  momentum  conservation  equations  obtained  from  equations  (153)  and  (154) 
are  respectively 


hv  , 

—  cos  9 

II 

c 

n 

cos  (9 ' 

-  0  .) 

+  mvC 

cos  (0  -  9  ,  ) 

(157) 

c  V 

C  9 

V 

C9 

V  C’P 

hv  .  . 

—  sin  3 

II 

o 

sin  (9  '  -  0  ) 

+  mvC 

sin  (9  -  9  ) 

(158) 

(159) 


hv’ 

c 


=  mvS , 

4/ 


e'  +  9  =  0  +  e  , 

V  S())  V 


(160) 


where  and  are  given  by  equations  (131)  and  (135)  respectively,  and 
and  Sgx  are  given  by  equations  (132)  and  (136)  respectively,  and  similarly 


y2  2 

cos  (ij;  cos  6^)  +  sinh  (ip  sin  6^)  (161) 


/2  2 

sin  (\p  cos  6^)  +  sinh  ((j;  sin  6^)  (162) 


tan 


6  ,  =  tan  (ip  cos  6 ,  )  tanh  (ti/  sin  9 , ) 
cip  ^  ip'  ^ 


(163) 


tan  9  ,  =  cot  (il;  cos  0  , )  tanh  (il;  sin  0 , ) 

Sip  ip  ip 


where 


+  j't'i 


(164) 


(165) 


cos  ijl  =  C  e 

Ij; 

(166) 

sin  ip  =  S  e^ 

ip 

(167) 

The  six  equations  (155)  through  (160)  can  be  solved  simultaneously  for  the  six 
unknowns  v'  ,  0^  ,  v  ,  9^  ,  <|)  ,  and  6^  in  terms  of  the  four  known  quantities  v  ,  0^  , 
ip  ,  and  9^  . 

For  a  bulk  matter  system  or  vacuum  with  broken  internal  symmetries,  the 
relativistic  analogs  of  the  energy  and  momentum  conservation  equations  (152) 
through  (154)  are^ 


hv  =  mc^  (y  -  1)  +  hv' 


(168) 


hv 

c 


hv' 

c 


cos  (p  +  myv  cos 


4> 


(169) 


hv '  .  - 

-  sin  (J)  =  myv  sin 


'P 


(170) 
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where  the  complex  number  velocity  factor  for  a  particle  with  a  velocity  that 
has  a  broken  symmetry  is 


Y  =  »  (1  -  v^/c^)  (171) 

and  where  the  magnitude  and  internal  phase  angle  of  the  complex  number  boost  is 


(A 

Y  =  (f  +  b  ) 


tan  (20^)  =  b/f 


where 


b  =  v^/c^  sin  (20^) 


f  =  1  -  v^/c^  cos  (20  ) 

V 


(172) 


(173) 


(174) 


(175) 


The  six  scalar  component  equations  corresponding  to  equations  (168)  through 
(170)  are 


hv  cos  0  »  me  (y  cos  0  -  1)  +  hv'  cos  0' 

V  ' '  Y  V 


hv  sin  0  =  me  Y  sin  0  +  hv'  sin  0' 

V  Y 


(176) 

(177) 


—  cos  0  =  C.  cos  (0'  -  0  .)  +  myvC  cos  (0  +0  -  6  )  (178) 

c  V  c  (t)  V  c<t>  V  Y  V  cij/ 

—  sin  0  =  C,  sin  (0'  -  0  )  +  myvC  sin  (0  +0  -  0  )  (179) 

c  V  c  (j)  V  C(|)  4;  Y  V  c4) 


S  =  myvS 

c  (p  p 


0'+0^»0  +0  +0, 
V  s4)  Y  V  s4< 


(180) 


(181) 


In  the  limit  v/c  ->•  0  equations  (176)  through  (184)  reduce  to  equations  (155) 
through  (160)  by  noting  that 
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(182) 


1  2  2 

Y-*  l+-rv  /c  cos  (26  )  -^  1 

I  V 


6,,  x  v^/c^  sin  (29  )  ->  0 

t  2  V 


(183) 


The  six  equations  (155)  through  (160)  or  (176)  through  (181)  can  be  solved  nu¬ 
merically  using  Brovm's  algorithm  for  the  solution  of  simultaneous  nonlinear 
equations.  This  algorithm  is  a  modification  of  Newton's  method  and  requires  nc 


The  solution  of  equations  (168)  through  (170)  can  be  obtained  by  direct 
analogy  to  the  solution  for  the  standard  Compton  effect  as  follows^^ 


A  =  A  (1  -  cos  5) 
o 


(184) 


where 


J0A  = 


c/v '  =  c/v  'e 


(185) 


A  =  Ae^^^  =  c/v  =  c/ve 


(186) 


and  where  Aq  =  Compton  wavelength  =  h/ (me)  .  The  scalar  equivalents  for  equa¬ 
tion  (184)  are 


A'  cos  9^  =  A  cos  0^ 


0.  +  A  (1 


C.  cos  0  ^) 

4.  C.Y 


(187) 


A'  sin  0'  =  A  sin  0.  +  A  C,  sin  9  , 
A  A  o  <j)  c<(> 


(188) 


From  equations  (187)  and  (188)  it  follows  that 


■  r 


A  sin  ’O,  +  A  C,  sin  9  ^ 
_ A _ O  9 _ C<p 

COS  0.  +  A  (1  -  C,  cos  9  ,) 
A  o  4)  c4) 


(189) 


(A')^  =  A^  +  2AA  [cos  9,  -  C,  cos  (0,  +  0  )] 

O  A  9  A  c  9 


(190) 


+  A^(l  -  2C^  cos  9  ,  +  C^) 
o  9  c9  9' 


Equations  (189)  and  (190)  give  the  wavelength  internal  phase  angle  and  wave¬ 
length  magnitude  respectively  of  the  scattered  photon  in  asymmetric  bulk  matter 
or  vacuum.  The  corresponding  frequency  equations  can  be  obtained  from  equations 
(189)  and  (190)  by  noting  that  A'  =  c/v'  ,  A  =  c/v  ,  0|  =  -9^  ,  and  0^  =  -  0^,  . 

Note  that  equation  (187)  gives  the  change  in  measured  wavelengths,  and  this 
wavelength  difference  is  independent  of  the  wavelength  itself. 
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Consider  now  the  differential  cross  section  for  Compton  scattering  in  asym¬ 
metric  bulk  matter  or  vacuum.  The  standard  Compton  scattering  differential  cross 
section  is  given  by  the  Klein-Nishina  formula^ 


V 

a 

V 

a 


.  2 
srn 


(191) 


where  =  conventionally  determined  initial  photon  frequency,  and  =  con¬ 
ventionally  determined  scattered  photon  frequency.  The  generalization  to  the 
differential  scattering  cross  section  for  Compton  scattering  within  bulk  matter 
or  the  vacuum  with  broken  internal  symmetries  follows  from  equation  (191)  as 


or  equivalently  as 


.  2 
sin 


4>) 


(192) 


(193) 


where  v  and  v'  =  magnitudes  of  the  complex  number  initial  and  scattered  photon 
frequencies  respectively,  and  where 


^1  = 

0  ’  - 

V 

■  0 

V 

(194) 

”2  = 

3(0’ 

V 

-  9 

V 

) 

(195) 

2(0' 

V 

-  9 

V 

+  s  J 
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(196) 

Therefore 

from 
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(193) 

it 
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(197) 

I  sin 

®I  = 

2 

^o 

2 

'v  ' 

(— 
'v ' 

sin 

^1 

V 

sin  - 

S^  sin  r_) 

?  3 

(198) 

from  which  I  and  0]-  can  easily  be  obtained.  The  measured  differential  cross 
section  =  I  cos  . 

6.  COULOMB  SCATTERING  IN  BULK  MATTER  AxND  THE  VACUUM  WITH  BROKEN  INTERNAL 
SYMMETRIES .  This  section  considers  Rutherford,  Mott,  Bhabha,  and  Miller  scat- 
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tering  in  asymmetric  bulk  matter  and  vacuum. 


A.  Rutherford  Scattering 


The  a-particle  scattering  experiments  of  Rutherford  are  one  of  the  cornerstones 
of  knowledge  about  atomic  structure.  These  experiments  measured  the  scattering 
angles  of  a-particles  interacting  with  atomic  nuclei  of  charge  Ze.  The  basic 
formulas  for  Rutherford  scattering  give  the  differential  scattering  cross  section 


l^.f^)  = 


CSC 


(199) 


where 


A  = 


(200) 


4)3  =  measured  scattering  angle,  =  conventionally  calculated  initial  relative 
speed  of  the  a-particle  and  the  atomic  nucleus,  Z'  =  atomic  number  of  incident 
particle  (Z'  =  2  for  a-particle),  Z  =  atomic  number  of  the  atomic  nucleus,  and 
m  =  reduced  mass  of  the  incident  particle  and  the  atomic  nucleus.  In  addition 
to  the  differential  cross  section,  the  other  quantity  that  is  often  calculated 
is  the  number  of  particles  deviated  through  and  angle  between  4>g^  and  4)3  +  difig 
which  is  given  by^’ 


(201) 


a 

These  formulas  were  deduced  by  considering  the  scattering  of  an  a-particle  by  an 
isolated  atomic  nucleus  situated  in  a  symmetrical  vacuum. 


For  Rutherford  scattering  within  asymmetric^  bulk  matter  or  vacuum  equations 
(199)  and  (201)  need  to  be  modified  because  the  indicent  a-particle  speed  v  is 
now  a  complex  number,  and  because  the  deflection  angle  4  is  also  a  complex  num¬ 
ber.  Therefore  equations  (199)  and  (201)  must  now  be  written  as 


1(4,)  =  ^  csc^  I  (202) 

V 

dN  4TrA  5  2  5 

__  =  — _  cot  -ir  CSC  (20i) 

d4)  -4  2  2 

where  I((>)  =  complex  number  differential  scattering  cross  section,  4  =  complex_ 
number  deflection  angle,  dN/d|  »  complex  number  of  particles  deviated  through_(ji 
and  I  +  d4)  ,  and  v  =  complex  number  initial  a-particle  speed.  Because  v  and  6 
are  phase  rotateu,  the  number  of  a-particles  scattered  will  also  include  a  phase 
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rotated  part,  so  that 


N  =  (204) 

where  N  and  =  magnitude  and  internal  phase  angle  respectively  of  the  number 
of  scattered  particles. 

Using  the  following  standard  trigonometric  formulas 


2  6  1. 

sin  2  "2  ■  cos  6) 


^  6  sin  6 

tan  ^ 

^  1  +  cos  6 


2  6  1. 

cos  Y  ”  J  cos  6) 


and  combining  them  with  equations  (126)  and  (127)  gives 


csc^  -i  =  K  e 
2  s 


csc‘  I-  . 

2  s 


cot  =  K^e"^^'i 


sec^  =  K 
2  se 


where 


2  -1/2 

K  =  2(1  -  2C,  cos  9  ,  +  C,) 
s  6  c6  6 


K  =  (1  +  2C,  cos  9  ^  +  cf) 

t  S(j,  6  c6  6 


1/2 


2  -1/2 

K  ^  =  2(1  +  2C^  cos  9  +  CT) 

se  6  c6  6 


(205) 

(206) 

(207) 

(208) 

(209) 

(211) 

(212) 

(213) 

(214) 

(215) 
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(216) 


tan  X ,  = 


tan  z.  = 


C ,  sin  0  , 

9 _ ctp 

1  -  C,  cos  0  , 

(|)  Cc|) 

sin  0  ,  +  C,  sin  (0  +01 
_ s<j)  (})  c4)  s(i) 

cos  0  ^  +  cos  (0  +01 

Sij)  (|>  C<t)  S<j) 


C,  sin  0  ^ 
®  cd) 

y*  =  I  :rF-  co-i  0 


(217) 


(218) 


C(() 


where  ,  S(j)  ,  0(,^  ,  and  0g<j)  are  given  by  equations  (131),  (135),  (132),  and  (136) 
respectively . 


Combining  equations  (208)  and  (209)  with  equation  (202)  gives 
AK^ 


=  Xe^ei  + 


(219) 


or 


AK 


I  = 


0  =  _  40  -  2x^ 

I  V  (fl 


(220) 


(221) 


which  are  the  equations  for  the  magnitude  and  internal  phase  of  the  complex  num¬ 
ber  differential  cross  section  for  Rutherford  scattering  in  asymmetric  matter  or 
vacuum.  Combining  equations  (208)  and  (211)  with  equation  (203)  gives 


dN 

d<{) 


ld<li 

and  therefore 


dN 


4TrAK  ^  ^  ^ 

s  t  -j  (40^  +  x^  +  z^) 


(222) 


=  AttAK  K. /v 
s  t 


V  =  "n  ^  ®N,4.  -  %  ■  ^.<1.  =  -  "®V  -  ^  ^ 


(223) 


(224) 
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where 


tan 


d0j^/d4> 

dN/d(t) 


(225) 


tan 


d(j) 


(226) 


which  gives  the  magnitude  and  phase  angle  of  the  number  of  scattered  particles. 
The  measured  scattering  cross  section  is  given  by  I  cos  . 

B.  Mott  Scattering 

Mott  scattering  describes  the  Coulomb  scattering  of  two  identical  fermions 
such  as,  for  example,  two  protons.  The  differential  scattering  cross  section  for 
two  protons  scattering  in  the  symmetric  vacuum  is  described  in  the  center  of  mass 
coordinates  by  the  following  equation^^”^® 

a  A  r  4  ‘^a  4  ‘*’a  2  '*’a  2  . 

1  (i}>^)  =—  [esc  —  +  sec  —  CSC  sec  —  cos  (2^^  In  tan  “^)]  (227) 

V 

a 

where 

2  2 


2 

C  =  — 
^a  Tiv 

a 


(228) 

(229) 


and  m  =  reduced  mass  =  mp/2  where  nip  =  proton  mass,  and  Va  =  conventionally 
determined  relative  speed  of  the  two  protons.  For  the  scattering  of  two  protons 
within  asymmetric  bulk  matter  or  vacuum,  the  differential  scattering  cross  sec¬ 
tion  is  written  as  a  complex  number  as  follows 


-,T\  Ar  ^  <t>  .  4((>  2  <P  2  (t>  .  -  „  (J)  .  1 

I(<l>)  =  ~  [esc  +  sec  Y  -  CSC  ^  sec  y  cos  (2C  Ln  tan 

V 


(230) 


where 


l.sL.£ 

hv  tiv 


Equation  (230)  can  be  rewritten  as 


V 


(231) 


(232) 
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where  the  Rutherford  term  and  the  exchange  term  are  written  as 


E  se 


(233) 

(234) 


=  49  +  2x. 

JR  V  (J)  - 


®JE  = 


The  interaction  term  is  written  as 


(235) 

(236) 


-  „  „  -  -j  (40V +  X4,  -  y,},) 

J_  =  -  K  K  cos  G  e 
I  s  se 

where  from  equations  (211)  and  (230)  it  follows  that 


(237) 


G  =  Ge^  G  ^  2  I  tn 


(^) 


(238) 


2C(cos  6^  -  j  sin  9^)  (jz^  -  In  K^) 


=  2c[z,  sin  9  -  tn  cos  9  +  j(z,  cos  0  +  sin  0  )] 

6  V  t  v-’  d)  V  t  V 


so  that 


G  =  25  Jzl  +  (In  K^)2 


(239) 


tan  9 


z,  cos  9  +  -dn  sin  0 

I  ,  _ V  t _ V 

G  z,  ;  in  9  -  Zn  cos  9 

d)  V  tv 


(240) 


The  interaction  term  in  equation  (237)  can  be  rewritten  as 

T  -  T 


Jl  .  J^e 


(241) 


where 


(242) 


(243) 


Cg  =  y  cos^  (G  cos  6^)  +  sinh^  (G  sin  0^)  (244) 

tan  Q  „  -  tan  (G  cos  9^)  tanh  (G  sin  6  )  (245) 

CCj  o  g 

The  two  equations  for  determining  I  and  0;^  are  obtained  from  equations  (232) 
through  (245)  as 

I  cos  =  A  (J^  cos  +  Jg  cos  cos  0^^)  (246) 

V 


I  sin  0^  =  -  A  (J^  sin  sin  9^^  +  sin  9^^)  (247) 

V 

In  this  manner  a  theory  of  Mott  scattering  in  an  asymmetric  medium  is  developed 
which  is  consistent  with  the  gauge  theory  of  the  asymmetric  background  medium. 
The  measured  cross  section  is  *  I  cos  0j  . 

C.  Bhabha  Scattering 

Bhabha  scattering  is  electron-positron  scattering  6*^  +  6  -*■  e"*”  +  e  by 

photon  exchange  and  pair  annihilation.  The  differential  scattering  cross  sec¬ 
tion  in  the  center  of  mass  system  and  in  the  high  energy  limit  (E^>>  m)  is 
given  for  the  symmetrical  vacuum  by^^ 


r  4  ^a 

1  +  cos  -J- 

I®  a  - 2 -  - Z-  +  i.  (  1  + 

22  ,(|)  -^2''^ 

Y  V  j  4  a 

a  a  _  sin  -j- 


cos  ijl  ) 
a 


2  *a 
sin  — 


where 


2  me 


(248) 


(249) 


-  (1  -  vf/c^)'^^^  (250) 

a  •  fine  structure  constant,  m  ••  reduced  mass  of  electron,  and  v^  =  convention¬ 
ally  determined  speed  in  center  of  mass  system.  The  first  term  in  equation 
(248)  is  the  photon  exchange  term,  the  second  term  is  the  pair  annihilation 
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contribution,  while  the  third  term  represents  the  interference  between  the  first 
two  terms. 

The  corresponding  cress  section  for  Bhabha  scattering  in  asymmetric  bulk 
matter  or  vacuum  is  written  as 


!(<!>)  = 


-2-2 
Y  V 


4  4> 


1  +  cos  -r 


■  ^  <{> 
sin  -j 


2  1  2  - 
+  —  (1  +  cos  (>) 


4  b 
cos  Y 

.  2  ^ 
sin  Y 


(251-260) 


where  v  and  y  are  given  by  equations  (138)  and  (171)  respectively.  Combining 
equations  (208)  through  (215)  with  equation  (260)  gives 


I(^)  =  -Y^  (L^e  +  L2e  +  L^e  +  L^e  +  L^e  (261) 

Y  V 


where  y  is  now  the  magnitude  of  the  boost  for  a  broken  symmetry  system  and  is 
given  by  equation  (172),  and  where 


L,  =  K 
1  s 


2  2 

L„  = 

2  s  se 


L3  =  1/2 


0  =  2(6  +0  +  X  ) 

1  y  V  V 


<5^  =  2(9  +6  +  X,  +  y^) 

2  '  Y  V  ? 


b-  =  2(9  +  9  ) 

3  y  V 


h- hi 


L.  =  -  2K  /K 
5  s  se 


(262) 


(263) 


b  =  2(9  +0  +  9  J 

4  y  V  Cip 


(264) 


=  2(9  +9  +  X  /2  +  y  )  (265) 

5  y  V  <p  '<P 


(266) 


where  9y  is  given  by  equation  (173).  From  equation  (261)  it  follows  that 

g 

I  cos  9j.  =  2  2  ^^1  “^l  ^2  '*’2  ^3  *^3  ^  ^4  ^4  ^  ^5  ^5^  (267) 

Y  V 

I  sin  9j  - - ^  (Lj  sin  +  L2  sin  sin  sin  sin  b^)  (268) 

Y  V 
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from  which  I  and  0^.  can  be  easily  determined.  In  equations  (261),  (267),  and 
(268)  the  first  two  terms  are  due  to  photon  exchange,  terms  three  and  four  are 
due  to  pair  annihilation,  and  term  five  is  the  interference  term. 


D.  Mjiller  Scattering 


M<iller  scattering  is  electron-electron  scattering  e~  +  e~  ^  e  +  e  by 
photon  exchange.  The  differential  scattering  cross  section  for  this  process  in 
the  center  of  mass  system  and  for  high  energy  (Ea>>  m)  is  given  for  the  symmet¬ 
rical  vacuum 


= 


B 


2  2 
Y  V 
a  a 


4  ^a 

1  +  cos 


4  “^a 
1  +  sin  -T- 


sin 


4 


.  2  ^  2  ^a 

sin  —  cos  — 


cos 


4  ^a 


(269) 


where  =  scattering  angle,  and  =  ordinary  relativistic  boost  given  by  equa 
tion  (250).  The  first  term  in  equation  (269)  is  due  to  direct  scattering,  the 
second  is  due  to  interference,  and  the  third  term  is  the  result  of  exchange 
scattering. 

The  corresponding  differential  scattering  cross  section  for  M<5ller  scatter 
ing  in  bulk  matter  or  vacuum  with  broken  internal  symmetries  is  given  by 


I  = 


-2-2 
Y  V 


.  4  6 
sin  2 


sin 


(270) 


where  the  particle  speed  v  and  boost  y  for  a  broken  symmetry  system  are  given 
by  equations  (138)  and  (171)  respectively.  Combining  equation  (270)  with  equa¬ 
tions  (208)  through  (215)  gives 


I  = 


2  2 


(T^e 


T  e 
2 


-2^2 


T^e  + 


"4^ 


^5" 


-j’J'c 


(271) 


Y  V 

where  the  boost  y  for  a  broken  symmetry  system  is  given  by  equation  (172),  and 
where 


2 

T,  =  K 

(272) 

s 

4  se 

2  2 

YT/K 

T  =  /K^ 

(273) 

s  se 

5  se  s 

2K  K 

(274) 
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where  K  and  K  are  given  by  equations  (213)  and  (215)  respectively,  and  where 

S  S6 

i'l  -  n  -  -  y*) 

*2  ■  2(6^  +  »V  *4,  y*>  ♦5  '  ®v  ■  %  ■  y*’ 

♦  3.2(e^  +  9^  +  *^/2-y^/2)  (277) 

where  x,  and  y^  are  given  by  equations  (216)  and  (218)  respectively,  and  where 
0-y  is  expressed  in  terms  of  0.^  by  equation  (173).  From  equation  (271)  it  follows 
that  the  magnitude  and  internal  phase  of  the  differential  cross  section  for  Miller 
scattering  in  a  broken  symmetry  system  is  given  by 

I  cos  0J.  =  2  "^1  ^2  '^2  ^  ^3  '^3  ’^4  ^  ^5  "^5^  (278) 

Y  V 

I  sin  0j  - - ^^'^l  ”^2  ^3  ’^3  '^4  "^4  '^5  sini/^^)  (279) 

Y  V 

which  can  be  solved  for  I  and  0j-  immediately. 

7.  DIRAC  EQUATION  FOR  FERMIONS  IN  ASYMMETRIC  BULK  MATTER  OR  VACUUM.  The 
Dirac  equation  determines  the  spectrum  and  eigenfunctions  of  half-integral  spin 
particles  moving  in  an  external  potential.^®  The  eigenfunctions  take  the  form  of 
four-component  spinors,  and  therefore  the  Dirac  equation  for  a  particle  moving  in 
the  symmetric  vacuum  under  the  influence  of  an  external  potential  must  be  equiva¬ 
lent  to  four  equations.  In  fact  the  Dirac  equation  is  a  matrix  equation  involving 
4x4  matrices  and  is  written 

(-iy'^  —  +  m  +  *•  0  (280) 

3x  e 

pa 

where  x  =  t  ,  x  ,  y  ,  z,  ,  and  where  y  =  »  Yi  *  Y,  »  and  y.  are  the 

paaa  aa  yolz  J 

four  Dirac  matrices.  Within  asymmetric  bulk  matter  or  vacuum  the  Dirac  equa¬ 
tion  is  expected  to  be  written  as 

(-iy^  T=“  +  m  +  W)iJj  ”  0  (281) 

oX 

y 

where  v  *  spinor  with  internal  phase  given  by 


25S 


(282) 


ipj.  <P  sin  e^ 


(283) 


(284) 


and  where  the  complex  number  potential  is  written  as 


W  =  V  +  V 
e  g 


(285) 


The  gauge  rotated  time  and  space  coordinates  =  t  ,  x  ,  y  ,  and  z  of  a  particle 
in  bulk  matter  or  vacuum  with  broken  internal  symmetries  are  written  as 


t  *  te 


j®t 


X  =  xe 


jSx 


y  =  ye 


jSy 


z  =  ze 


j®z 


(286) 


(287) 


The  combined  effects  of  gauge  rotated  coordinates,  gauge  rotated  external  poten¬ 
tial  (which  is  a  function  of  the  gauge  rotated  coordinates) ,  and  the  gauge  po¬ 
tential  itself  Vg  ,  will  manifest  themselves  in  the  eigenvalues  and  eigenfunc¬ 
tions  of  the  Dirac  equation  for  a  fermion  located  in  an  asymmetric  system. 

The  space  and  time  derivatives  that  appear  in  equation  (281)  are  written  as 

-1/2 

9/3t  -  e-^®dt 


i/9t  =  e  ^®dt 


9/9x  = 


9/9y  =  e'^®dy 


9/9z  =■  e 


-30dz 


1  + 


1  + 


1  + 


1  + 


/  99 


/  99 


,-,-1/2 


/  39  / 

(yif) 

I  99  ^ 

M) 


21-1/2 


,-1/2 


COS  6  9/9t  (288) 

c  >  t 


9/9x  =  e  99dx  cos  6  9/9x  (289) 

X  f  X 


9/9y  ■  e  ^®dy  cos  B  9/9y  (290) 

y.y 


9/9z  =  e  ^®dz  (.Qs  6  9/92  (291) 

z,  z 


where 
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(292) 


0,  =8+6  =0+6  =0^ 
dt  t  t,t  o  0,0  do 


0  ,  =0  +6  =  0,  +  6,  ,  =  e,, 

dx  X  x,x  1  1,1  dl 


0  .  =0  +  B  =  9o  +  „  =  0^. 

dy  y  y,y  2  2,2  d2 


0  ,  =9  +6  =  0-  +  6,  T  =  0 

dz  z  z,z  3  3,3  d3 


(293) 

(294) 

(295) 


and  where 


tan 

6 

=  tan 

=  1 99  /at 

(296) 

0,0 

. 

t,t 

t 

tan 

h.i 

=  tan 

6 

=  X  39  /3x 

(297) 

x,x 

x 

tan 

^2,2 

=  tan 

'y.y 

=  y  99  /9y 

y 

(298) 

tan 

®3,3 

=  tan 

^z,z 

=  z  39  /9z 
z 

(299) 

In  this  way  the  necessary  space  and  time  derivatives  in  Dirac's  equation  for 
broken  symmetry  jystems  are  evaluated. 


Equation  (281)  can  then  be  rewritten  as 
(-ie  cos  6  T —  +  m  +  W)  ij)  =  C 


cos  6  Y^  T —  +  m  +  W)  i()  =  0 
U,U  9x 

U 


(300) 


The  two  matrix  equations  corresponding  to  equation  (300)  are  obtained  by  taking 
the  real  and  imaginary  parts  in  the  internal  space  as  follows 


(-i  cos  9  ,  cos  6  Y^  9/9x  +  m  +  W  cos  0,,) 

dy  y,y  y  W  R 


-  (i  sin  9 ,  cos  6  y^  9/3x  +  W  sin  9  )  !|;  =  0 

dy  y,y  '  y  W 


(i  sin  9  ,  cos  6  Y^  9/9x  +  W  sin  6  )  ij/ 

dy  y,y  y  w  R 


+  (~i  cos  9  ,  cos  6  Y*^  9/9x  +  m  +  W  cos  9,,)  =  0 

dy  y,y  y  W  I 


(301) 


(302) 
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where  it  is  assumed  that  the  mass  is  a  real  number  that  is  not  affected  by  the 
gauge  rotations  due  to  the  asymmetric  background.  Note  that  from  equation  (285) 
it  follows  that 


w 

cos 

0T, 

=  V 

cos 

0Tr 

+  V 

cos 

w 

e 

Ve 

g 

Vg 

w 

sin 

9it 

=  V 

sin 

Qtt 

+  V 

sin 

W 

e 

Ve 

g 

Vg 

(303) 

(304) 


The  equations  (301)  and  (302)  are  equivalent  to  eight  equations  for  the  eight 

^,0  1  ,2  ,3  ,o  ,1  ,2  .,3  •n.-i 

spinor  components  <p  ,  ii/  ,  ip  ,  ipr,  »  'I't  »  equivalently 

KRRR1.I.1  -L 

3  t  4>^  i  s  9 ,  »6,i»0,o»  Slid  •  Therefore  Dirac's  equation  for 

ipo  ip  I  tpz 

a  fermion  located  in  a  background  with  broken  internal  S3nmnetry  is  equivalent  to 
eight  independent  equations.  An  approximate  solution  ignores  the  imaginary  wave- 
function  components,  which  gives 


(-  i  cos  0,  cos  6  3/8x  +  m  +  W  cos  6,.)  =  0  (301A) 

du  p,U  U  W  R 

(i  sin  0 ,  cos  2  Y^  9/3x  +  W  sin  6„)  =  0  (302A) 

dp  P,P  P  W  ’^R 

as  the  Dirac  equations  with  four  spinor  components. 

Alternatively,  equation  (300)  can  be  combined  with  equation  (282)  to 
give  the  following  set  of  Dirac  equations 

[-  i  cos  0,  cos  6  y^(9/9x  +  90, /9x  )  +  m  +  W  cos  =  0  (304A) 

dp  u,p  p  i(;  p 

[i  sin  0,  cos  6  Y^(3/3x  +  90, /9x  )  +  W  sin  0„]'j;  =  0  (304B) 

dp  p,p  p  4;  p 

If  the  space  and  time  derivatives  of  0^  can  be  neglected  these  equations  become 
(-  i  cos  0  ,  cos  6  Y^  9/9x  +  ra  +  W  cos  B^Jtp  =  0  (304C) 

dp  p,p  p  W 

(i  sin  0 ,  cos  6  y^  9/9x  +  W  sin  0.,)4'  =  0  (304D) 

dp  p ,  p  p  W 

8.  SCHRODINGER'S  EQUATION  FOR  A  PARTICLE  WITHIN  ASYMMETRIC  BULK  MATTER  OR 
VACUUM.  This  section  considers  the  effects  of  bulk  matter  and  vacuum  with  broken 
internal  symmetries  on  Schrddinger 's  equation  for  a  particle  moving  in  a  poten¬ 
tial  field.  The  time  dependent  SchrOdinger  equation  for  a  particle  moving  in  a 
potential  field  in  a  symmetric  vacuum  is  written  as^® 
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[  (p^  +  +  p2  )  +  =  iti  |i- 

2in  Wa  za  e  3t 

^  a 


(305) 


where  the  single  particle  momentum  and  energy  operators  are  given  by 


P  =  -  13i3/3a 
aa  a 


E  =■  ih3/3t 
a  a 


(306) 


with  a  =  X  ,  y  ,  z  .  Within  asymmetric  bulk  matter  or  vacuum  it  is  assumed  that 
space,  time,  momentum  operators,  energy  operator,  potential,  and  wave  functions 
exhibit  broken  symmetries  and  must  be  represented  by  complex  numbers  in  inter¬ 
nal  space.  For  this  case  the  time  dependent  Schrbdinger  equation  is  written  as 


[^{p2  +  p2  +  p2)  +w]^  =  itl|l 

2m  X  z  3t 


(307) 


where  W  =  V  +  V  and  where 
e  g 


p  =  p  =  -  ihS/Soi  =  -  ih  cos  6  e  ^  3/3a 

a  a  a, a 


(308) 


E  =  Ee^®E  =  iti3/3t  =  ih  cos  8^  ^  3/3t 

t,t 


where 


cos  6  =  1  + 

a, a 


/  ' 


(309) 
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0  .  =6  +8 
da  a  a,  a 


(311) 


cos  S.  ^  =  1  + 

t  >  L 


/  59 


(312) 


0^  =0  +8 
dt  t  '^t,t 


(313) 


where  8^  ^  and  8ct,a  given  in  equations  (296)  through  (299) 
(308)  an^  (309)  it  follows  that 


From  equations 


p  =  -  ih  cos  8  3/ 3a 

'^a  a, a 


(314) 


0  =  -  0  , 
pa  da 


(315) 


E  =  ih  cos  8  3/3t 

t  j  t 


(316) 
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(317) 


It  is  easy  to  show  from  the  Heisenberg  uncertainty  principle  applied  to  p^ 
and  a  and  to  E  and  t  that  6qi  a  <  0  and  Qt,t  ^  that  from  equations  (296) 

through  (299)  it  follows  that  0^^  is  a  decreasing  function  of  a  ,  and  0^  is  a 
decreasing  function  of  t. 

The  kinetic  energy  operator  in  equation  (307)  is  written  as 


a=l 


hi. 

2m 


cos 


Gt.Ct 


.-j^da  _L 

3a 


( cos  6 


a  «a 


(318) 


For  simplicity  it  is  assumed  that  and  0^^  are  slowly  varying  functions  of 
position  so  that  equation  (318)  can  be  rewritten  as 


a=l 


2 

cos 


6 

a, a 


e 


-j2edct 


(319) 


Writing  the  wavefunction  as  allows  equation  (307)  to  be  written  as 

two  component  equations  as  follows 


8 


a, a 


cos  (20  ,  ) 
da'^ 


3a 


(320) 


+  W(cos  0^  -  sin  0^ 


=  ih  cos  6 

c  >  t 


|cos  0 


dt 


•  a 

IF  ®dt 


9 

y 

2m  ZL 


2  , 
cos  0 


a=l 


a  ,a 


3a 


3a 


(321) 


W(sin  0y  +  cos  0^  iPj.) 


‘-t(- 


5’p 


iti  cos  6  ^  I  -  sin  0  .  -:r—  +  cos  0  , 

dt  ot  dt  3t 
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For  the  case  of 


Equations  (320)  and  (321)  can  be  used  to  determine  and  , 
a  stationary  state  the  wave  function  components  are  written  as 


-iet/ti 


=  '{',e 


-iet/h 


(322) 


and  equations  (320)  and  (321)  become 


2m  Zl 


2 

cos  6 


a=l 


a, a 


cos  (26  )  - —  +  sin  (20  ,  )  - ^ 

act  ,  2  da  ,  2 


3a 


3a 


(323) 


+  W(cos  0^  -  sin  0^^  4)^) 


=  e  cos  6^^^  (cos  0^^  +  Sin  6^.  ^j) 


3 


Y  2  o 
/  cos  e 
2m  ^  a, a 

a=l 


-  Sin  (26^^)  +  cos  (20^^)  ^ 

L  3at  3o! 


(324) 


+  W(sin  0^  +  cos  0^^  <l>j) 


■  £  cos  (-  Sin  .|,j,  +  cos  .>j) 


It  is  generally  quite  difficult  to  determine  and  ij)j  (and  e)  from  equations 
(323)  and  (324).  The  form  and  magnitude  of  the  functions  ^  (■  »  ®Ha  ’ 

®dt  depend  on  the  nature,  density,  and  temperature  of  the  asymmetric  bulk  matter 
or  vacuum  surrounding  a  particle. 

Consider  the  case  where  the  asymmetries  are  sufficiently  small  that  the  im¬ 
aginary  part  of  the  wavefunction  can  be  neglected  in  equation  (323),  so  that  this 
equation  becomes 


n“  Y 

Tp-  /  cos 
2m  ^ 

i=l 


2 


cos  (20  ) 

1,1  a  a 


da 


^  +  W  cos  6y 


e  cos  6  cos 

C  f  t 


®dt 


(325) 


For  an  isotropic  system  equation  (325)  becomes 


36^ 


At  this  point  it  is  easy  to  treat  the  Klein-Gordon  equation  for  a  particle 
that  is  located  in  an  asymmetric  background.  For  a  particle  in  a  symmetric  vac¬ 
uum,  the  Klein-Gordon  equation  is  written  as^**"^’ 


-2  ,a  „  „  2  4 

9  4'  2^2 ,  me. 

2"  “  'a'^a - r  '^a 

at  ^  ^  ® 

a 


(330) 


Within  asymmetric  bulk  matter  or  vacuum  the  spacetime  interactions  induce  a  bro¬ 
ken  symmetry  in  the  wave  function  and  in  the  space  and  time  coordinates,  so  that 
the  Klein-Gordon  equation  becomes 


at  Vax  ay  ai  /  -h 


(331) 


Taking  the  real  and  imaginary  components  of  equation  (331)  gives 


(333) 


where 


.2.  7, 

2  r  ^  '^T 

cos  [cos(2e^^)  -^+sin(20^^)  — ^] 

on  3n 


(334) 


a^i|»  a^iii 

3r|  on 


(335) 


where  n  =  t,  x,  y,  z,  and  where 


dn  n  n,n 


(336) 


9 .  CONCLUSION .  On  account  of  spacetime  interactions  with  bulk  matter  and 
the  vacuum,  these  systems  exhibit  broken  internal  symmetries.  In  the  case  of 
black  body  radiation  in  asymmetric  bulk  matter  or  vacuum,  the  photons  have  com¬ 
plex  number  frequencies  which  produce  a  radiation  pressure  and  energy  density 
that  have  broken  internal  symmetries.  The  space  and  time  coordinates  within  a 
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d^()) 

- ^  +  - S -  (e  cos  6  cos  6  ,  -  W  cos  9^)  (!>„  =  0  (326) 

2  '>2  s  t,t  dt  w  K 

dx  3h  cos  S  cos  (26  ,  ) 
x,x  dx 

This  can  be  rewritten  as 


,  2in*  ,  ^  ^ 

+  (e  .  „  )  -  0 

dx  3h 


(327) 


where  m*  is  an  effective  given  by 


cos  8^  cos  (20J^) 


(328) 


and  W*  is  an  energy  dependent  effective  potential  given  by 


W*  =  e(l  -  cos  3^  ^  cos  6,,.)  +  W  cos  6„ 

L  9  L  Gu  W 


(329) 


Equations  (323)  and  (324)  can  also  be  written  in  terms  of  the  magnitude  and 
phase  angle  of  the  wavefunction  by  writing  iJ>r  =  (}>  cos  9,^  and  (J*!  =  sin  9^  . 

If  the  derivatives  of  the  phase  angle  6(j,  are  sufficiently  small  and  can  be  ne¬ 
glected  then  equations  (323)  and  (324)  can  be  rewritten  as 


V  2 

-  >  cos  B  cos  (29  .  )  ^  +  W  cos  6,,  <j) 

2m  "  a, a  da  ,2  W 

a=l  da 


(323A) 


=  e  cos  B^  ^  cos  6 

t ,  t  dt 


■=—  ^  cos  B  sin  (26  )  — -  +  W  sin  9  if 

2m  a, a  da  ,2  W 

a»l  da 


(324a) 


=  -  e  cos  B.  .  sin  9 

t  f  t  dt 


For  a  one  dimensional  system  the  factor  3  that  appears  in  equations  (326)  and 
(327)  should  be  replaced  by  unity.  Therefore  in  asymmetric  bulk  matter  or  vac¬ 
uum  the  particle  acquires  an  effective  mass,  due  to  spacetime  interactions,  which 
is  larger  than  the  bare  mass.  In  addition  an  energy  dependent  effective  poten¬ 
tial  arises  whose  value  depends  on  the  degree  of  asymmetry  that  exists  in  the 
background  of  the  particle. 
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broken  symmetry  system  are  also  gauge  rotated  and  are  described  by  internal 
phase  angles.  From  this  it  follows  that  geometrical  angles  are  described  by 
complex  numbers  and  have  internal  phase  angles.  The  skewed  nature  of  space  and 
time  affects  the  fundamental  scattering  processes  of  atomic  particles.  All  atom¬ 
ic  processes  that  occur  in  asymmetric  bulk  matter  or  vacuum  should  also  have  bro¬ 
ken  symmetries  that  are  manifested  in  the  measured  differential  cross  sections. 
For  broken  symmetry  quantum  systems,  the  asymmetry  produces  an  effective  mass  in 
the  SchrOdinger  equation  that  is  larger  than  the  bare  mass  of  a  particle. 
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Figure  1 .  Magnitude  of  electron  kinetic  energy  versus  the 
magnitude  of  the  photon  energy. 


Figure  2.  Phase  angle  of  the  electron  kinetic  energy  versus 
the  magnitude  of  the  photon  energy. 


MAXWELL'S  EQUATIONS  WITH  BROKEN  INTERNAL  SYMMETRIES 


Richard  A.  Weiss 

U.  S.  Army  Engineer  Waterways  Experiment  Station 
Vicksburg,  Mississippi  39180 


ABSTRACT .  On  account  of  the  broken  symmetries  of  the  thermodynamic  ground 
state  and  excited  states  of  bulk  matter  and  the  vacuum,  the  electric  and  magnet¬ 
ic  fields  in  bulk  matter  and  the  vacuum  exhibit  broken  internal  symmetries.  Max¬ 
well's  equations  are  formulated  for  an  electromagnetic  field  with  broken  inter¬ 
nal  symmetry.  Lorentz  covariance  is  expressed  in  terms  of  space  and  time  coor¬ 
dinates  that  have  broken  symmetries  represented  by  internal  phase  angles.  Spe¬ 
cial  relativity  mechanics  in  bulk  matter  and  the  vacuum  with  broken  symmetries 
is  formulated  for  particles  whose  kinematic  and  dynamic  variables  exhibit  inter¬ 
nal  phases.  Electromagnetic  wave  equations  for  broken  symmetry  matter  and  vac¬ 
uum  are  developed  and  the  gauge  conditions  for  the  electromagnetic  potential  are 
developed.  The  vacuum  state  is  shown  to  have  properties  that  are  essentially 
similar  to  those  of  a  bulk  matter  system,  and  in  particular  both  exhibit  broken 
internal  symmetry.  The  description  of  electromagnetic  effects  in  matter  and  the 
vacuum  must  properly  account  for  the  broken  symmetry  of  the  fields  and  space  and 
time  coordinates,  and  the  internal  phase  angles  of  the  electromagnetic  field  vec¬ 
tors  must  be  determined  jointly  with  the  internal  phase  angles  of  the  space  and 
time  coordinates.  A  better  knowledge  of  electromagnetic  interactions  in  bulk 
matter  will  be  useful  for  understanding  electromagnetic  wave  propagation  in  the 
atmosphere  and  for  comprehending  the  complex  processes  that  occur  when  high  ener¬ 
gy  microwave  beams  interact  with  matter. 

1 .  INTRODUCTION .  Electrodynamics  is  a  theoiry  that  is  based  on  the  Lorentz 
covariant  set  of  Maxwell's  equations  and  on  the  symmetry  of  the  gauge  group 
U(l).^“^  This  theory  has  charges  and  currents  as  the  sources  of  the  electromag¬ 
netic  field.  Maxwell's  equations  are  a  set  of  partial  differential  equations 
that  determine  the  space  and  time  variation  of  the  electric  and  magnetic  fields 
that  are  associated  with  the  distribution  of  charges  and  currents.  Classically, 
the  charges  and  currents  are  situated  in  a  passive  space  and  time  background 
(the  vacuum)  which  is  assumed  to  be  inert  and  plays  no  active  part  in  the  deter¬ 
mination  of  the  fields.  In  quantum  electrodynamics,  the  vacuum  is  taken  to  be  a 
polarizable  medium  which  can  affect  the  energy  levels  of  charged  particle  config¬ 
urations.  The  active  vacuum  is  one  of  the  great  discoveries  of  twentieth  century 
physics,  and  has  been  experimentally  verified  in  a  number  of  ways  including  a 
measurement  of  the  Lamb  shift  of  energy  levels. 

In  this  paper  an  additional  vacuum  effect  on  the  electromagnetic  field  is 
suggested  to  manifest  itself  through  the  fact  that  space  and  time  coordinates 
within  asymmetric  bulk  matter  or  vacuum  acquire  internal  phase  angles  (broken 
symmetries) .  The  electric  and  magnetic  field  vectors  also  acquire  broken  sym¬ 
metries.  The  internal  phase  angles  of  the  space  and  time  coordinates  and  of 
the  electromagnetic  field  vectors  are  due  to  the  interaction  of  Minkowski  space- 
time  with  bulk  matter,  the  electromagnetic  field,  and  the  vacuum.  The  internal 
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phase  angles  of  the  electromagnetic  field  vectors  must  be  determined  jointly 
with  the  internal  phase  angles  of  the  space  and  time  coordinates,  and  it  is  the 
joint  solution  of  Maxwell's  equations  and  the  equations  of  motion  of  charged 
particles  for  a  system  with  broken  internal  symmetries  that  accomplishes  this 
task. 


The  relativistic  values  of  the  electromagnetic 
bulk  matter  or  vacuum  must  satisfy  the  relativistic 
tion.^~®  This  radiation  trace  equation  relates  the 
sure  to  the  corresponding  nonrelativistic  radiation 
tion  for  radiation  is  derived  from  the  relativistic 
state  of  bulk  matter  which  is  written  as^*® 


field  vectors  in  asymmetric 
trace  equation  for  radia- 
renormalized  radiation  pres- 
pressure.  The  trace  equa- 
trace  equation  for  the  ground 


or  equivalently  as 

(I  -  b  +  T  ^  -  bV^)E  -  3(1  +  9  +  V  ^  -  ^)P  -  /  (2) 

where 


and  where  U,  E,  P,  Y»  and  b  are  complex  number  representations  of  the  internal 
energy,  energy  density,  pressure,  and  the  gauge  parameters,  T  =  absolute  temper¬ 
ature,  and  V  =  volume  of  specified  number  of  particles.  The  complex  number  Grtln- 
eisen  parameter  is  defined  as 


V  3P  3P/3T  Qy 

=  - -  =  ye  T 

Cv  3E/3T 


(A) 


where  y  and  6..  =  magnitude  and  phase  of  the  Grtlneisen  parameter  respectively. 
The  corresponding  equation  for  radiation  in  matter  with  internal  phases  is  de¬ 
rived  from  equation  (2)  to  be® 

(1  -  b  +  -  bvi)E^  -  S^(T  If  -  P)  (5) 

-  3[{1  +  T  +  -  -  S_^(l|f  -  P)] 


where 
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a 


(6) 


=  (T 


3T 


+  1  -  b 


‘)E^  -  B;(t 


ap 

9T 


-  p^) 


and  where  E^.,  P^.,  6^..  and  6^.  are  the  complex  number  radiation  energy  density, 
radiation  pressure,  and  two  radiation  gauge  functions  respectively.®  The  radi¬ 
ation  Griineisen  parameter  is  defined  by 


3P  dE 
"^r  ar”  ^  ai 

Throughout  this  paper  the  index  "a"  will  refer  to  nonrelativistic  (unrenormal¬ 
ized)  calculations.  Equation  (1)  with  its  right  hand  side  set  equal  to  zero 
represents  the  asymmetric  ground  state  of  the  vacuum,  while  equation  (5)  with 
its  right  hand  side  equal  to  zero  represents  the  excited  (radiation)  states  of 
the  asymmetric  vacuum. 

The  relativistic  trace  equations  for  the  ground  and  excited  states  of  bulk 
matter  and  the  vacuum  imply  that  the  ground  state  and  excited  state  pressure 
fields  have  broken  symmetries.®  In  turn,  this  implies  that  all  of  the  descrip¬ 
tive  variables  of  particles  and  fields  located  in  asymmetric  bulk  matter  or  vac¬ 
uum  also  exhibit  broken  symmetries.  Therefore  the  space  and  time  coordinates  as 
well  as  the  electric  and  magnetic  field  vectors  will  exhibit  broken  symmetries 
as  manifested  by  internal  phase  angles.  The  space  and  time  coordinates  are  writ- 


ten  as 

X  =  xe^^x 

(7) 

y  =  ye^S 

(8) 

J  ®z 

z  =  ze  2 

(9) 

j  9,- 

t  =  te-^  L 

(10) 

and  the  derivatives  with  respect  to  the  space  and  time  coordinates  are  written 
as 

a/ax  =  e  cos  p  a/dx  (ii) 

X  ,x 

a/at  =  e  ^Qg  2  a/at  (i2) 

t  »  L 

where 


tan  B=xa0/ax  (13) 

X,X  X 

tan  B  =  t  36  /at  (1^) 

C  9  c  c 
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cos  3  =  [l  +  (x90  /3x)^] 

X,X  X 

(15) 

cos  3  =  [l  +  (t  39  /3t)^] 

L  9  L  L 

(16) 

e ,  =0  +  s 

dx  X  x,x 

(17) 

9j  =0  +3 

dt  t  t,t 

(17A) 

a  condensed  notation  the  coordinates  and  derivatives 
t,  X,  y,  z  as  follows 

are  written  with 

n  =  ne-^^h 

(18) 

9/3?)  =  e  cos  B  3/3n 

n ,  n 

(19) 

0^  =0  +3 

dn  n  n,n 

(20) 

tan  S  =  n  36  /3n 

n,n  n 

(21) 

cos  3  =  [l  +  (n 30  /3n)^] 

(22) 

Note  that  it  is  the  real  parts  of  the  complex  number  quantities  such  as  space 
and  time  coordinates,  electric  and  magnetic  field  vectors,  pressure  and  energy 
that  are  the  measured  quantities. 

9 

This  paper  develops  Maxwell's  equations  for  electromagnetic  fields  that 
have  broken  internal  symmetries  and  for  space  and  time  coordinates  that  also 
have  broken  internal  symmetries.  Section  2  considers  the  fields  and  coordi¬ 
nates  with  broken  symmetry,  while  Section  3  develops  Maxwell's  equations  and 
the  equations  of  motion  of  charged  particles  in  an  electromagnetic  field  in  a 
broken  symmetry  vacuum  or  bulk  matter  system.  Section  4  develops  the  conse¬ 
quences  of  assuming  the  validity  of  Lorentz  covariance  for  coordinate  systems 
with  broken  internal  symmetry.  In  Section  5  the  electromagnetic  wave  equations 
and  their  gauge  conditions  are  written  for  systems  with  internal  phase  angles. 
Finally,  Section  6  develops  the  equations  of  the  relativistic  vacuum  from  the 
corresponding  bulk  matter  equations,  and  a  broken  symmetry  condition  for  the 
vacuum  state  is  suggested.  Therefore  all  of  the  conclusions  for  the  bulk  mat¬ 
ter  state  with  broken  internal  symmetries  are  also  valid  for  the  broken  sym¬ 
metry  vacuum  state.  The  conventional  coordinates  ,  x^  ,  y^  ,  are  related 
to  the  measured  values  t^  ,  Xm  ,  ym  »  Zm  of  the  complex  number  coordinates  by 
"  *^m  =  ®t  »  Xg  =  Xjn  =  X  cos  0x  .  Ya  =  Vm  =  V  oos  By  and 

z„  =  z„  =  z  cos  9,  . 
am  z 


2.  THE  BROKEN  SYMMETRY  OF  ELECTROMAGNETIC  FIELDS.  For  electromagnetic 
waves  within  asymmetric  bulk  matter  or  vacuum,  the  electric  and  magnetic  fields 
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are  expected  to  acquire  internal  phase  angles.  This  is  due  to  the  fact  that 
the  spacetime  coordinates  and  the  kinematical  and  dynamical  variables  of  parti¬ 
cles  in  asymmetric  bulk  matter  or  vacuum  exhibit  broken  internal  symmetries.  In 
particular  the  particle  velocity,  and  therefore  the  electric  current  for  charged 
particles,  has  an  internal  phase  angle.  Therefore  the  cartesian  components  of 
the  electric  and  magnetic  field  vectors  in  asymmetric  bulk  matter  or  vacuum  can 
be  written  as 


E  =  E  e^^Ea  =  e  „  +  jE  ^ 
a  a  uR  al 


D  =  D  e^^°“  =  D  „  +  jD 

'V  ini  D 


a  a 


aR  al 


H  =  He  = 
a  a  aR  al 


B  =  B  e 
a  a 


j®Ba  _ 


\r  ^®al 


(23) 


(24) 


(25) 


(26) 


where  a  =  x,  y,  and  z.  The  phase  angles  ,  and  are  in  general 

functions  of  space  and  time  of  the  general  form 


(27) 


The  field  vector  amplitudes  are  also  functions  of  space  and  time,  as  for  example 

E  =  E  (x,y,z,t,9  ,9  ,9  ,9^)  (28) 

a  a  -^  x  y  z  t 

The  imaginary  number  j  will  be  used  to  refer  to  internal  phase  angles  that  are 
associated  with  broken  symmetries,  while  the  imaginary  number  i  will  refer  to 
external  phase  angles.  For  plane  waves  the  magnitudes  of  the  field  vectors  in 
equations  (23)  through  (26)  may  be  written  as 


E  = 


D  =  « 

a  Ua 


H  =  e 
a  Ha 


B  =  A-  e 
a  oa 


i? 

(29) 

i5 

(30) 

iS 

(31) 

u 

(32) 

where 
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£=kx+ky  +  k.z-;ijt 
X  y  2 

where  ,  Ajj^^  ,  and  A3^  are  constants;  kx  ,  k, 

ponents,  and  u)  =  magnitude  of  the  frequency. 

In  general  the  internal  phase  angles  9£a  ,  ©pet  » 
of  space  and  time,  and  it  follows  from  equations  (18) 
through  (26)  that 


3E  /3n  = 
a 

cos 

3 

n  .n 

Ea,n 

3D  /3n  = 
a 

cos 

3 

n.n 

Da,n 

3H  /3n  = 
a 

cos 

3 

n,n 

w„ 

Ha,n 

3B  /3n  = 
a 

cos 

6 

n.n 

"Ba.n 

where  n  =  t,  x,  y,  z  and  where 

=  J  OE  /9n)^  +  (E  ae  /3n)^ 

Ea ,  n  >  a  a  ta 

V.n  =  y(3I>./3n)'  + 

V.n  ■ 

w  =  yT9B/3ny^+^  3e  /9n)^ 
Ba,n  V  a  a  Ba 

Ea,n  Ect  Ea,n  dn 

Da.n  Da  Da,n  dn 

^Ha,n  ^Ha  ^Ha>n  dn 

'^Ba.n  ”  ^Ba  ^Ba,n  ®dn 


(32A) 

,  k^  =  wavenumber  com- 

,  and  ©Ba  are  functions 
through  (22)  and  (23) 

(33) 

(34) 

(35) 

(36) 

(37) 

(38) 

(39) 

(40) 

(41) 

(42) 

(43) 

(44) 
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tan  0 


=  E 


30p  /3n 
Ect 


Ea,ri  a  3E/3ri 
a 


tan  =  D 


36^  /3n 
Da 


"Da ,  n  a  S-D  /  3  n 
a 


tan  3 


Ha ,  t 


=  H 


a  3  H  /  3  n 
a 


tan  3 


Ba ,  n 


=  B 


a  3B  /3n 
a 


and  where 


9  ,  =9  +3 

d  n  n  r', ,  n 


(45) 


(46) 


(47) 


(48) 


(49) 


If  the  electric  and  magnetic  fields  have  an  external  time  dependence  given 
by  equations  (29)  through  (32)  it  follows  that 


where  g  is  given  by  equation  (32A) ,  with  similar  expressions  for  the  magnetic 
vector  components.  Similarly,  the  derivative  with  respect  to  the  coordinate  x 
is  given  by 
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with  similar  expressions  for  the  magnetic  field  vector  components  and  for  the 
derivatives  with  respect  to  y  and  z  . 

It  is  sometimes  conventient  to  write  the  derivatives  in  equations  (33)  and 
(36)  in  the  following  alternative  forms 


9E  3E 

^  =  cos  S  e'J^dn  ^  =  R  (E  .E  ^)  +  jl,  (E  _,E  ^) 
on  HiH  9n  In  aR  al  In  aR  al 


■ft 

cos  B  e"^  dn  =  R  (B  q  )  +  ji  (b  b  _) 
n.n  3n  In  aR  al  In  aR  al”^ 


where 


R,  (E  n,E  t)  ®  cos  B  (cos  0,  3E  „/3r]  +  sin  6,  oE 
In  aR  al*^  n,n  dn  aR  dn  al 


=  cos  6  cos  <&_  W_ 

n,n  Ea,n  Ea.n 


%  cos  B  cos  (9_  -  6,  )  3E  /3n 

n,n  Ea  dn  a 


I,  (E  t)  =  cos  B  (-  sin  9,  3E  _/3n  +  cos  9,^  3E  ^/3n)  (S"') 

In  aR  al  n,n  dn  aR  dn  al 


=  cos  B  sin  W_ 

n ,  n  Ea ,  n  Ea ,  n 


'V  cos  B  sin  (0„  -  9,  )  3E  /3n 

n,n  Ea  dn  a 


R,  (B  =  cos  B  (cos  9  3B  /3n  +  sin  9  3B  /3n) 

In  aR  al  n,n  dn  aR  dn  al 


=  cos  B  cos  W_ 

n.n  3a .  n  Ba ,  n 


cos  B  cos  (9„  -  9  ,  )  3B  /3n 

n,n  Ba  dn  a 
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(59) 


I,  (B  ,B  ,)  =  cos  6  (-  sin  6,  3B  -,/3n  +  cos  0,  3B 

In  aR  al  n,n  dn  aR'  dn  al  ’ 


=  cos  6  sin  <f_  W_ 

n,n  Ba,n  Ba,n 


cos  B  sin  (6  -  0  ,  )  3B  /3n 

n,n  Ba  dn  a 


and  where 


E  ^  =  E  cos  0„ 
aR  a  Ea 


E  ^  =  E  sin  0_ 
al  a  Ea 


B  „  =  B  cos  0„ 
aR  a  Ba 


B  -.  =  B  sin  0n 
al  a  Ba 


The  second  derivatives  of  the  field  vectors  are  obtained  by  consecutively 
applying  the  first  derivative  operators  that  appear  in  equation  (54)  as  follows 


cos  6  ^  (cos  B 

n,n  3n  n,n  3n 


where  a  =  x,  y,  z  and  n  =  x,  y,  z,  t  .  For  simplicity  it  will  be  assumed  that 
^  and  0j^  are  slowly  varying  functions  of  n  .  Within  the  limits  of  this  ap¬ 
proximation  the  second  derivatives  of  the  field  vectors  are  written  as 


a^E 

a  L  r, 

- r-  'V'  cos  3^  ^  e 

3n^ 


-2j9dn  tS. 


2  “  ^Zn^^aR’^al^  ^^Zn^^aR’^al^ 


cos^  0  e  Ea.n  x_ 

n ,  n  Ea ,  n 


^  2  „  J(eEo-2ej„) 

cos  P  e  - TT 
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(66) 


^  2  „ 

- cos  3  e 

^-2  ri,n 

on 


-23edn  . 


3t)' 


2n'  aR’  al^ 


cos^  S  Ha,n  .j, 

n ,  n  Ha , n 


cos2  3  g2(%a-2edn)i\ 


where 


Tr 

Ea.n 

T 

Ha.n 

Ea,n 

’^a 

Ha,n 

and  where 


2  2 
9  E  9  E 

“n.n  t‘=°=<“dn>  ~~T  * 


3n 


on 


cos  3  cos  f-  T_ 

n ,  n  Eu ,  n  Ea ,  n 


2 

2.  cos  cos  (6^^^  -  20^^) 

on 


(66A) 

(66B) 

(66C) 

(66D) 

(67) 
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2  2 

"2n‘'=aR’''aP  '  .  [’  <2°h„) - ^  *  cos  (20,„)  - ^] 


n,n 


dn'  -  2 

3n 


dn"  .  2 

3n 


cos  B  sin  't'„  T 

n .  n  Ea ,  n  Ea ,  n 


2  3^E 

-.cos  Sln(e^^  -  29^^)  ^ 

3n 


2 

3  H 


''2n'"aR-\l>  -  Sn.n  — F  *  — F^ 


3^H 


a  I 


3n 


3n 


cos  6  cos  't'  T„ 

n , n  Ha , n  Ha , n 


2  9^H 

■o  cos  6  cos  (9^^  -  29  ) 

3n 


3^H 


^2n«oR’'’aI>  -  ''n.n  f'  <'®dn>  TF  +  — f’ 


3^H 


a  I 


3n 


3n 


cos  6  sin  ■'?„  T„ 

n ,  n  Ha ,  n  Ha ,  n 


2 

2  3  H 

cos  p  sin  (6  -  26  ,  )  - ^ 

r,  ,n  Ha  dn  ^  2 

3n 


and 


2  2 

3  0^  3  E 

tan  5  =  E  - ^  ^ 

“  3n^  90^ 


2  2 

3  e„  3  H 
Ha  ,  a 

tan  <S  =  H  - p—  /  - ^ 

Ha ,n  a  .  2  .2 

Dn  3n 


(68) 


(69) 


(70) 


(70A) 


(70B) 
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Similar  equations  can  be  written  for  the  and  components.  In  this  way  the 
derivatives  necessary  for  evaluating  Maxwell's  equations  in  asymmetric  bulk  mat¬ 
ter  or  vacuum  can  be  evaluated. 


3.  MAXWELL'S  EQUATIONS  WITH  BROKEN  INTERNAL  SYMMETRIES.  This  section  de¬ 
velops  Maxwell's  equations  for  electromagnetic  fields  whose  field  vectors  have 
internal  phase  angles.  The  asymmetric  bulk  matter  or  vacuum  in  which  the  elec¬ 
tromagnetic  fields  exist  also  have  broken  internal  symmetries  in  the  static  pres¬ 
sure  field,  Griineisen  parameter,  and  in  the  space  and  time  coordinates  of  each 
point  in  the  system.  The  internal  phase  angles  of  the  ambient  medium  such  as 
Sp  ,  By  ,  0x  ,  0y  ,  0z  0t  ™’^st  be  calculated  in  conjunction  with  the  internal 
phase  angles  of  the  electromagnetic  field  vectors.  The  quantities  of  0p  and  Qy 
are  obtained  from  the  ground  state  relativistic  trace  equation  (1)  and  equation 
(4)  that  defines  the  relativistic  Grtineisen  function. 


as 


9- 


The  unrenormalized  Maxwell  equations  for  charges  and  currents  are  written 
1  7 


B^ 


0 


V  *0=0 

a  q 

X  =  3?i^/3t  +  j^ 

a  a 

V  X  =  -3l®/3t 


(71) 

(72) 

(73) 

(74) 


where  B  =  unrenormalized  magnetic  induction  vector,  D  =  unrenormalized  elec- 

trie  displacement  vector,  p  =  unrenormalized  charge  density,  H  =  unrenormalized 

-►a  ~*'3- 

magnetic  field  vector,  j  =  unrenormalized  current  density  vector,  and  E  =  unre¬ 
normalized  electric  field  vector.  Equations  (71)  through  (74)  represent  eight 
equations,  six  of  which  are  independent.  The  simplest  constitutive  equations  are 
the  following 


ta  a:>a 
B  =  u  H 


(75) 


r  = 


a->a 
c  E 


(76) 


3  3 

where  y  =  unrenormalized  magnetic  permeability,  and  e  =  unrenormalized  dielec¬ 
tric  constant  (permittivity).  More  complicated  constitutive  equations  are  often 
used.^®  In  general 


a  a.  a  a, 
u  =  u  (P  ,Y  ) 


a  a,_a  a, 
e  =  e  (P  ,Y  ) 


(77) 
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where  P  and  y  are  functions  of  density  and  temperature. 

Within  asymmetric  bulk  matter  or  vacuum  a  similar  set  of  Maxwell’s  equa¬ 
tions  must  be  valid  except  now  the  renormalized  electric  and  magnetic  field 
vectors  must  have  internal  phases,  and  the  space  and  time  coordinates  must  also 
have  internal  phase  angles.  Therefore  equations  (71)  through  (74)  can  be  writ¬ 
ten  for  the  electromagnetic  field  in  bulk  matter  or  vacuum  with  broken  internal 
symmetries  as  follows 


B  =  0 


(78) 


7  •  D  =  p 


7  X  H  =  3D/3t  +  J 


(79) 

(80) 


7  X  E  =  -3B/3t 


(81) 


where  3  =  renormalized  complex  number  magnetic  induction  vector,  D  =  renormal¬ 
ized  complex  number  electric  displacement  vector,  p  =  renormalized  charge  den- 
sity,  H  =  renormalized  complex  number  magnetic  field  vector,  j  =  renormalized 

complex  number  current  density  vector,  and  E  =  renormalized  complex  number  elec¬ 
tric  field  vector.  Equations  (78)  through  (81)  are  complex  number  vector  equa¬ 
tions  and  represent  a  total  of  sixteen  equations,  twelve  of  which  are  indepen- 

ent.  Note  that  p  /  p  as  can  be  seen  from  equations  (72)  and  (79)  on  account 

-  -*a  9  q  a  a  a 

of  D  D  .  Since  p  =  nq  and  p  =  nq  ,  where  q  and  q  =  renormalized  and  unre- 

^  ^  a 

normalized  charge  per  particle,  it  follows  that  q  q  .  The  renormalized  cur- 

■i  ±  'i 

rent  density  is  given  by  j  =  nqv  ,  where  v  =  vector  particle  velocity  with  in¬ 
ternal  phase.  The  internal  phase  angle  of  the  particle  velocity  is  a  function 

-♦•a 

of  0  ,0  ,9  ,9..  and  clearly  j  f  j 

X  y  z  t 

The  simplest  renormalized  constitutive  equations  are  written  as 


B  =  pH  (82) 

5  =  eE  (83) 

where  u  =  renormalized  magnetic  permeability,  and  e  =  renormalized  permittivity. 
Taking  account  of  the  fact  that  equations  (82)  and  (83)  are  vector  equations 
with  real  and  imaginary  parts,  it  is  clear  that  they  represent  twelve  equations. 
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The  state  equations  for  u  and  e  will  be  assumed  to  be  given  by 


M  =  M  (P,y) 


£  =  e  (P,y) 


where  the  functions  are  evaluated  at  the  values  of  the  renormalized 
and  Griineisen  function  as  determined  from  a  solution  of  equation  (1) 


The  component  form  of  the  renormalized 
are  written  as 

3B  /3x  +  3B  /3y  +  3B  /3z  =  0 
X  y  z 

3D  /3x  +  3D  /3y  +  3D  /32  =  o 
X  y  ■'  z  q 

3H, /3x  -  3H  /3y  =  3D  /3t  +  j 
y  X  z  z 

3H,/3y  -  3H  /oz  =  3D  /3t  +  J 

2  y  X  "^x 

3H  /3z  -  3H  /3x  =  3D  /3t  +  J 
X  z  y  y 

3E  /3x  -  3E  /3y  =  -  3B  /3t 
y  X  ^  z 

3E,/3y  -  3E  /3z  =  -  3B  /3t 
2  y  X 

3E  /3z  -  3E  /3x  =  -  3B  /3t 
X  z  y 

Using  equations  (33)  through  (49)  to 
equations  to  be  rewritten  as 


(84) 

(85) 

pressure 

Maxwell  equations  (78)  through  (81) 

(86) 

(87) 

(88) 

(89) 

(90) 

(91) 

(92) 

(93) 

evaluate  derivatives  allows  Maxwell's 


^  ^  °  cos  ■$_  cos  S 

Bx,x  Bx,x  x,x  3y,y  By,y  y,y 


(94) 


+  W  cos  1>_  cos  £  =0 

Bz,z  Bz,z  2,z 
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sin  '4>^  cos  e  +  sin  <!>  cos  6 

Bx,x  Bx,x  x,x  By,y  By,y  y,y 


+  sin  cos  S  =0 

Bz  ,  2  Bz , z  z , z 


W_  cos  cos  B  +  W  cos  <J>  cos  S 

Dx,x  Dx,x  x,x  Dy,y  Dy.y  y.y 


+  cos  cos  B  ~  P„ 

Dz,z  Dz,z  z,z  q 


sin  41^  cos  B  +  sin  4>^  cos  B 

Dx,x  Dx,x  x,x  Dy.y  Dy,y  y,y 


+  W_  sin  •*>„  cos  B  =0 
Dz , z  Dz , z  z , z 


W ,  cos  '{>  cos  S  ~  W„  cos  4>,,  cos  6 
Hy.x  Hy.x  x,x  Hx.y  Hx.y  y,y 


=  W  cos  $  cos  S.  ^  cos  0 

Dz,t  Dz,t  t.t  z  jz 


W„  sin  cos  3  "  ,  sin  t*  cos  B 

Hy.x  Hy.x  x.x  Hx.y  Hx.y  y.y 


=  W„  sin  4>_  ^  cos  6^  _  +  j  ^  sin  0 

Dz.t  Dz.t  t.t  z  JZ 


W  cos  !>  cos  8  -  W„  cos  cos  B 

Hz.y  Hz.y  y.y  Hy.z  Hy.z  z.z 


(100) 


cos  4>_  ^  cos  B.  _  +  j.,  cos  9 

Dx . t  Dx. t  t.t  X  j . 


W„  sin  4,,  cos  6  -  W„  sin  •>  cos  B 

Hz.y  Hz.y  y.y  Hy.z  Hy.z  z.z 


(101) 


■  “dx.c  ’dx.l  ^  "j:< 
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W_  sin  !>_  cos  a  -  W_  sin  ■?_  cos  S 
Ex,z  Ex,z  z,z  Ez,x  Ez,x  x,x 


(109) 


=  -  W  sin  ^  cos  ^ 

By ,  t  By ,  t  t ,  t 

Maxwell's  equations  can  be  written  in  an  alternative  but  equivalent  ^orm  by 
using  equations  (54)  through  (59)  as  follows 


R,  (B  „,B  ^)  +  R,  (B  „,B  ^)  +  R,  (B  „,B  t)  =  0 
lx  xR  xl  ly  yR  yl  Iz  zR  zl 


I,  (B  o.B  ^)  +  I,  (B  „,B  t)  +  I,  (B  „,B  t)  =  0 
lx  xR  xl'  ly'  yR’  yl'  Iz'  zR  zl 


(110) 

(111) 


R,  (D  „,D  ^)  +  R,  (D  ^,D  ^)  +  R,  (D  „,D  .,)  =  d 
lx  xR  xl  ly'  yR  yl'  Iz'  zR  zl  q 


I,  (D  „,D  ^)  +  I,  (D  -,D  ^)  +  I,  (D  _,D  ,)  =  0 
lx  xR  xl  ly  yR  yl  Iz  zR  zl 


(112) 

(113) 


(H  ^) 

'  yR  yl 

^zR 

(114) 

(H  ^) 

yR  yl 

^zl 

(115) 

R,  (H  t)  -  R,  (H  ^)  =  R,,(D  „,D  .)  +  j  „ 
ly  zR  zl  Iz  yR  yl  It  xR  xl  -“xR 


(116) 


lx 


-  I,  (H  o.H  t) 
Iz  yR  yl 

-  ht^xR-^xl)  + 

^Xl 

(117) 

<\r’“kI> 

^yR 

(118) 

'■Ix'^zR’^zI^ 

■  ht<V’“yI> 

^yl 

(119) 

-  ■'ly'^xR'^xl' 

“  -  "u^zR-^zl) 

(120) 

-  hy<\R-"xI> 

■  -  ht«zR-®zI> 

(121) 
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(124) 


^Iz^^xR’^xI^  ^Ix^^zR’^zI^  "  ^It^^yR’^yl^ 


where 


(125) 


^zR 

=  ^z 

cos 

6  . 

J2 

^zl 

II 

Ut, 

N 

sin 

N 

CD 

(126) 

^xR 

"  jx 

cos 

3  . 
jx 

■^xl 

=  4 

sin 

0. 

jx 

(127) 

■^vR 

cos 

6  . 
jy 

Jyl 

=  jy 

sin 

6. 

jy 

(128) 

The  radiation  pressure  is  related  to  the  radiation  energy  density  for 
isotropic  radiation  with  broken  internal  symmetry  by  the  following  approxi¬ 
mate  formula 


P 

r 


P  e 
r 


j9pr 


=  E 
3  r 


(129) 


Equation  (129)  is  exact  only  for  symmetrical  isotropic  radiation.  Equation 
(12?)  gives  the  following  approximate  equations 


P 

r 


4  ^ 

3  r 


(130) 


e 


Pr 


(131) 


where  =  radiation  energy  density  with  broken  internal  symmetry,  that  is 
related  to  the  electromagnetic  field  vectors  as  follows 


E 

r 


E~_ 


E  )  +  4  (H“  +  H“  +  H  ) 
z  2  X  V  z 


(132) 


Equation  (132)  is  equivalent  to  the  following  two  equations 


(133) 


E  cos  ir-  =  CO®  (29c-  )  +  cos  (29  )  +  cos  (26  )] 

r  cr  2  X  Ex  y  Ey  z  Ez 


-t-  cos  (2e„  )  +  cos  (20„  )  +  cos  (2e„  )] 

2.'-  X  Hx  y  Hy  z  Hz  ■' 


sin 


=  |[  Ej  sin  (20^„)  +  E?.  sin  (20c.„)  +  sin  (29^^)] 


'Er  2  ^  X 


Ex^ 


Ey^ 


(134) 


+  f  sin  (29^)  +  hJ  sin  (29^^)  +  hJ  sin  (26^^^)] 


from  which  E^  and  6^^  can  be  immediately  obtained.  The  unrenormalized  radiation 
density  is  given  by  equation  (132)  with  the  bars  removed  and  with  the  superscript 
"a"  inserted  on  all  quantities. 


In  addition  to  Maxwell's  equations  several  other  equations  are  required  to 
form  a  complete  set  of  equations  to  determine  the  phase  angles  of  the  spacetime 
coordinates  as  well  as  the  phase  angles  of  the  electromagnetic  field  vectors. 

Six  of  the  additional  equations  required  are  the  equations  of  motion  for  charged 
bulk  matter  (plasma) .  These  six  equations  are  given  by  the  complex  number  vec¬ 
tor  nonrelativistic  Euler  equations  combined  with  the  Lorentz  force  as  follows^®”^^ 


p  dv  /dt  =  -  3P/35  -  3P  /3a  -  3W/3a  +  p  (E  +  v  x  B)  (135) 

a  r  q  a 

where  a  =  x  ,  y  ,  z  ,  p  =  mass  density,  V(j  =  spatial  components  of  particle  veloc¬ 
ity  with  internal  phase,  P  =  static  pressure  with  internal  phase,  Pj.  =  radiation 
pressure  with  internal  phase,  W  =  external  potential  (such  as  gravity)  with  in¬ 
ternal  phase,  and  v  =  particle  velocity  vector  with  internal  phase  =  (v^  ,  Vy  ,  v^)  • 
The  static  pressure  and  external  potential  are  complex  numbers  with  internal  phase 
angles  and  are  written  as 


P  =  Pe-^^^ 

(136) 

W  =  We^^W 

(137) 

Tlie  time  derivatives  of  the  velocity  components  in  equation  (135)  are  given  by 
the  following  six  equations 


a  =  dv  /dt  *  3v  /3t  +  V  v  3v  /3a  (138) 

a  r  a  “  a  a 


where  the  following  six  equations  define  the  complex  number  velocity 
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(139) 


V  =  da/dt  =  V  e-^ 
a  a 


(da/dt)^  +  (a  de  /dt)^ 
_ o _ 

1  +  (t  de^/dt)^ 


(140) 


9  +  B 

a  a,  t 


0  -  B 

t  t.t 


(141) 


where 


tan 

B  =  a 

a,t 

a 

da/dt 

d0 

30 

a 

a  dx 

dt 

9x  dt 

(142) 


(143) 


In  addition,  the  continuity  equation 


3p  /3t  +  7  •  (o  V  )  =  0 

9  q 


(144A) 


is  necessary  to  determine  6^.  and  Pq(x,y,z,t)  .  Equation  (144A)  has  two  com¬ 
ponents  because  it  is  a  complex  number  scalar  equation. 


The  Maxwell  equations  for  broken  symmetry  matter,  equations  (94)  through 
(109)  or  equivalently  equations  (110)  through  (125),  are  not  sufficient  by  them¬ 
selves  to  determine  the  internal  phase  angles  of  the  space  and  time  coordinates. 
This  is  because  the  twelve  independent  Maxwell  equations  (88)  through  (93) ,  the 
twelve  constitutive  equations  (82)  and  (83),  the  two  components  of  the  ground 
state  trace  equation  (1),  the  two  components  of  the  ground  state  Grtlneisen  para¬ 
meter  equation  (4),  the  two  components  of  the  excited  state  trace  equation  (5), 
the  two  components  of  the  radiation  Grlineisen  parameter  equation  (6A) ,  the  two 
state  equations  (84)  and  (85)  for  the  renormalized  magnetic  permeability  and 
electric  permittivity,  and  the  two  components  of  the  continuity  equation,  re¬ 
present  thirty-six  equations.  However,  thus  far  only  thirty-five  field  and 

matter  variables  have  been  enumerated  and  these  are:  E  ,  0„  ;  H  ,9.,  ;  B  ,  9„  ; 

a  Ea  ‘ 


“a  ’  "Ea  ’  “a  ’  ’  ^a  ’  ®Ba 

But  these  thirty-five  quantities 


E  ,  9c  ;  Y  >  0  ;  E  f  Sr 
E  ’  '  Y  r  Er 


Y  >9  ;  E  ,  u  ;  and  p 

'  r  ’  yr  ’  ’  q 


are  related  to  nineteen  kinematic  and  dynamic  variables  because  of  the  space  and 
time  derivatives  of  the  field  vectors  in  Maxwell’s  equations  (33)  through  (48) 
and  because  of  the  appearance  of  the  current  density  (velocity)  in  Maxwell's 
equations.  The  nineteen  kinematic  and  dynamic  variables  are:  x  ,  y  ,  z  ,  v^  ,  v  , 


V  ,  a 
z  X 


a  and  the  corresponding  phase  angles  9  ,  9  ,  9  ,  9  ,9  ,9 

z  i-cr  °xyzvxvyvz 
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S  ,3  ,3  and  by  itself  3^  .  In  these  calculations  the  magnitude  of  the 

ax  ay  az  ^  t  ^ 

time  t  IS  taken  to  be  a  free  and  independent  variable.  Therefore  a  total  of 
fifty-four  unknown  quantities  need  to  be  calculated,  and  thus  far  only  thirty- 
six  equations  have  been  enumerated.  The  additional  necessary  eighteen  equa¬ 
tions  are:  the  six  equations  of  motion  (135),  the  six  kinematic  acceleration 
equations  (138),  and  the  six  kinematic  velocity  equations  (139).  Thus  there 
are  fifty-four  equations  and  fifty-four  unknown  variables  to  be  determined. 
Note  that  the  two  components  of  the  complex  number  scalar  continuity  equation 
introduces  only  one  new  unknown  quantity  pq,  and  this  leaves  the  second  com¬ 
ponent  equation  to  determine  0^  which  stands  by  itself  because  t  is  taken  to 
be  an  independent  variable. 


The  relativistic  trace  equations  (1)  and  (5)  play  an  important  part  in  the 
calculation  of  the  renormalized  electromagnetic  fields  in  asymmetric  bulk  matter 
or  vacuum.  Starting  with  the  unrenormalized  ground  state  energy  density  and 
Grlineisen  parameter,  and  respectively,  equation  (1)  is  used  to  calculate 
the  renormalized  values  of  the  ground  state  energy  density  E  ,  3£  and  the  ground 
state  Grlineisen  parameter  y  ,  0^  .  The  renormalized  values  of  magnetic  permeabil¬ 
ity  y  and  dielectric  constant  e  are  expressed  in  terms  of  P  and  y  through  equa¬ 
tions  (84)  and  (85).  In  addition  to  Maxwell's  equations,  the  radiation  trace 
equation  (5),  in  conjunction  with  equations  (133)  and  (134)  that  relate  the  radi¬ 
ation  energy  density  to  the  electromagnetic  field  vectors,  determines  the  renor¬ 
malized  field  vectors  in  terms  of  the  corresponding  unrenormalized  values.  The 
solution  of  the  unrenormalized  Maxwell  equations  (71)  through  (74)  gives  the  un¬ 
renormalized  field  vectors  in  terms  of  the  unrenormalized  charge  density  pg  and 
current  density  .  The  unrenormalized  energy  density  is  then  calculated  in 
terms  of  the  unrenormalized  field  vectors  using  equations  (133)  and  (134).  Then 
equation  (5)  is  applied  again  in  conjuction  with  equations  (133)  and  (134)  and 
Maxwell's  equations  to  obtain  the  renormalized  field  vectors.  Finally  the  re¬ 
normalized  charge  and  current  density  are  obtained  from  the  renormalized  field 
vectors  by  using  equations  (79)  and  (80). 


For  electromagnetic  waves  in  the  vacuum,  the  total  density  p  and  the  charge 
density  Pq  that  appear  in  equations  (135)  refer  to  test  charges  placed  within 
the  vacuum  to  measure  the  electromagnetic  field  strengths.  Therefore  the  case 
of  electromagnetic  waves  in  the  vacuum  is  formally  equivalent  to  the  case  of 
electromagnetic  waves  in  bulk  matter.  For  the  vacuum 
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where  (v)  refers  Co  the  vacuum  state  (see  Section  6). 

The  ener'jv  conservation  equation  is  the  first  integral  of  equation  (135) 
and  in  its  simplest  form  is  written  as^^ 

1-2--  - 

-ov  +  P  +  P  +  W-  p  jE*vdt  =  constant  (l‘i6) 

i  r  q  ' 

where  v  =  complex  number  vector  velocity  whose  complex  number  magnitude  is 
given  by 


-2  -2  -2  -2 

V  =  V  +  V  +  V  (147) 

X  y  z 


where 


j  ®v 

V  =  ve 


j9 

V  =  V  e 

a  a 


Note  that 


va 


(147A) 

(147B) 


3 

E  •  V  =  y  E  V  (148) 

a  a 

a=l 

Were  it  possible  to  neglect  the  charge  density  term  by  making  vanishingly 
small,  equation  (146)  becomes 


1  -2  -  -  - 

—  pv  +  P^  +  P  +  W  =  constant  (1^+9) 

where  the  mass  density  p  refers  to  a  test  probe.  Finally,  the  dynamical  equa¬ 
tions  for  relativistic  bulk  matter  with  broken  internal  symmetry  are  given  by 
the  following  generalization  of  equation  (135)‘^ 


[P  + 


(P  +  P^)/c"]y^ 


dv 

a 

dt 


3P 

r 


3a 


(150) 


-  - FoyCe+vxB) 

3a  q  a 

where  y  =  complex  velocity  factor  that  is  defined  in  Section  4,  and  where 
c  =  light  speed  in  the  vacuum. 
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4.  LORENTZ  INVARIANCE  IN  ASYMMETRIC  BULK  MATTER  AND  VACUUM.  This  sec¬ 
tion  considers  the  Lorentz  invariance  of  Maxwell's  equations  in  bulk  matter 
and  vacuum  with  broken  internal  symmetries.  Maxwell's  equations  for  symmetric 
systems,  equations  (71)  through  (74),  are  invariant  under  the  Lorentz  transfor¬ 
mation  of  coordinate  systems^  ’  ^  ^ 


x'=y(x  -  vt) 
a  a  a  a  a 


t '  =  Y  (t  -  V  X  /c  ) 
a  a  a  a  a 


(151) 


(152) 


where  v^  -  relative  speed  of  coordinate  systems,  and  the  standard  velocity 
factor  is  given  by 


Y,  =  (1 


(153) 


where  6^  -  v^/c  ,  where  c  =  light  speed  in  vacuum.  The  Lorentz  transformation 
can  be  obtained  by  requiring  the  form  invariance  of  the  Minkowski  metric  as 
follows 


12  ,2  2^,2 

t  =  X  '  -  c  t 


(154) 


General  relativity,  which  is  not  considered  in  this  paper,  uses  a  Riemann 
metric,  ^ 

The  form  of  Maxwell's  equations  for  charges  and  currents  in  asymmetric  bulk 
matter  or  vacuum,  equations  (78)  through  (81)  ,  is  the  same  as  that  for  symmetric 
bulk  matter  or  vacuum,  equations  (71)  through  (74).  The  only  difference  is  that 
in  asymmetric  systems  the  field  vectors,  current  density,  and  spacetime  coordi¬ 
nates  are  complex  numbers.  Therefore  by  the  same  analysis  that  shows  the  sym¬ 
metric  Maxwell  equations  (71)  through  (74)  to  be  form  covarian<^  under  the  real 
number  Lorentz  transformation  equations  (151)  through  (153),  it  follows  that  the 
asymmetric  Maxwell  equations  (78)  through  (81)  are  form  covariant  under  the  fol¬ 
lowing  complex  number  Lorentz  transformations 


x'  =  y(x  -  vt) 
t'  =  Y(t  vx/c^) 


(155) 


(156) 


where  v  =  complex  number  relative  speed  of  the  two  coordinate  systems,  and  the 
complex  number  velocity  factor  for  an  asymmetric  system  is  given  by 


-9  -1/2 
Y  =  (1  -  3^) 


(157) 


where c =  v/c  .  Also,  simple  algebra  shows  that  equations  (155)  through  (157) 
satisfy 


-,2  2-, 2  -2  2-2 

X  -ct  =x  -  ct 


(158) 
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where  x  ,  t  ,  x'  and  t'  are  space  and  time  coordinates  that  exhibit  broken  inter¬ 
nal  symmetry,  and  in  general 


j  9„ 

X  =  xe*'  ^ 

1  j0V 

(159) 

t  - 

t’  *  t'e^®t 

(160) 

and  the  relative  speed  of 

the  coordinate  systems  is 

written  as 

j9v 

v  =  ve-*  ^ 

6  =  6ej®^ 

(161) 

where  9  ,  9 '  ,  9  ,  9 '  and 

X  X  t  t 

9  are  functions  of  P  and 

V 

9p  of  the  ambient  asymmetric 

bulk  matter  or  vacuum. 

Combining  equations  ( 

;i57)  and  (161)  gives 

Y  ”  (f  -  jb)  = 

/  f  +  ib  i9„ 

^  *  Ye-'  Y 

/  f^  +  b^ 

(162) 

where 

f  =  1  -  3^  cos  (29  ) 

V 

(163) 

b  =  6^  sin  (29^) 

(164) 

From  equation  (162)  it  follows  that  for  an  asymmetric  system  the  magnitude  and 
internal  phase  angle  of  the  velocity  factor  are  given  by 


Y  =  (f 


[l  -  26  cos  (26^) 


(165) 


3“  sin  (29  ) 

tan  (29  )  =  - - - - —  (166) 

1-3  cos  (29  ) 

V 

Note  that  3  =  v/c  ,  where  now  v  =  magnitude  of  the  complex  number  velocity  that 
appears  in  equation  (161).  Also,  if  9^  =  0  then  equation  (165)  reduces  to  equa¬ 
tion  (153) . 

The  Lorentz  transformations  in  equations  (155)  and  (156)  can  be  written  in 
the  form  of  real  and  imaginary  components  as  follows 
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(167) 


9'  =  y[x  cos  (0  +  6  )  -  vc  cos  (9  +9  +  9  )] 

X  '  X  Y  V  t 


0'  =  y[x  sin  (6  +  6  )  -  vt  sin  (9  +6  +6)1 

X  '  X  Y  V  t  V 


(168) 


'y 

t '  cos  6  '  =  Y[t  cos  (0  +  9  )  -  vx/c  cos  (0  +  e  +  S  )] 

t  t  Y  V  X  Y 


(169) 


t'  sin  9'  =  Y[t  sin  (0^  +  6  )  -  vx/c^  sin  (6  +0  +9)1 

t  t  Y  V  X  Y 


(170) 


where  y  is  given  by  equation  (165).  From  equations  (167)  and  (168)  x'  and  0' 
can  be  calculated  as  follows 


x'^  =  Y^[x^  +  v^t^  -  2vtx  cos  (6  +0  -  0  )] 

t  V  X  -* 


(171) 


tan  9  '  = 
X 


X  sin  (0  +  0  )  -  vt  sin  (0  +9  +  0  ) 

V  A/'  ►  N.' 


X  cos  (0  +  0  )  -  vt  cos  (0  +0  +  6  ) 

X  Y  V  t  Y 


(172) 


while  from  equations  (169)  and  (170)  t'  and  9^  can  be  calculated  in  the  follow¬ 
ing  manner 


t'^  =  Y^[t^  +  v^x“/c^  -  2vxt/c^  cos  (9  +0  -  9  )] 


X  V  t 


(173) 


t  sin  (6  +  9  )  -  vx/c  sin  (9  +0  +  9  ) 

tan  9'  =  - ^ - 1 - 2 - ^ ^ ^ 

t  cos  (6  +  0  )  -  vx/c  cos  (0  +0  +  9  ) 
t  Y  V  X  Y 


(174) 


From  equations  (171)  and  (173)  the  Minkowski  interval  can  be  written  as 

x'*"  -  c^t'^  =  Y^[(l  -  8^)  (x^  -  c^t^)  -  4vtx  sin  0_^  sin  (9^  -  6^.)]  (175) 

where  y  is  given  by  equation  (165).  If  9^  =  0  equation  (175)  reduces  to  equa¬ 
tion  (154)  . 

Consider  now  some  properties  of  the  velocity  factor  y  given  by  equation 
(165).  The  first  thing  to  see  is  that  y  is  not  singular  for  real  values  of  6  . 
In  fact  it  is  easy  to  show  that  the  roots  of  the  denominator  in  equation  (165) 
are  given  by 


-  28^  cos  (20^)  +1=0 


(176) 
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or 


+  i  9 

B  =  e'J  V  (177) 

Only  when  0^  =  0  does  equation  (176)  have  a  real  root  B  =  1  which  agrees  with 
equation  (153).  By  taking  the  derivative  of  y  given  by  equation  (165)  and  set¬ 
ting  the  result  equal  to  zero  it  is  easy  to  show  that  y  a  maximum  value  (as¬ 
suming  8v  to  be  independent  of  velocity)  given  by 

[y]  =  [sin  (20  )] '^^^  (178) 

max  V 

and  this  maximum  value  ol  y  occurs  at  a  value  of  B  given  by 

1/2 

[b]  =  [cos  (20  )]  <  1  (179) 

max  y  V 

Combining  equations  (166)  and  (179)  gives  the  following  value  of  0^  at  the 
maximum  point  of  y 


[a  ]  =  r/A  - 

y  max  y 


(180) 


The  values  of  y  and  0^  for  3=1  are  obtained  from  equations  (165)  and  (166) 
respectively  as 


[Y]g.|  •  [2  sin 

(181) 

■  ='''  -  V 

(182) 

The  functions  y  and  Qy  appear  in  Figures  1  and  2  respectively.  As  shown  by 
equations  (179)  the  maximum  value  of  y  occurs  for  3  <  1  >  and  if  0^  is  small 
the  maximum  value  of  y  occurs  close  to  3  =  1  .  Within  asymmetric  bulk  matter 
or  vacuum  y  is  nonsingular.  For  3  '''  0  it  follows  from  equations  (165)  and 
(166)  that 


y  1  +  3^  cos  (29^)  (183) 

0  V  3“  sin  (20  )  (184) 

y  2  V 

For  3  -►  ®  it  follows  from  equations  (165)  and  (166)  that 

y  -*■  1/3  0  (185) 

9->--t/2-9  (186) 

V  V 
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where  6^  is  assumed  to  be  independent  of  particle  velocity.  It  is  assumed 
that  6^  depends  on  P  and  0p  of  the  ambient  medium. 

The  complex  number  de  Broglie  wavelength  for  a  relativistic  particle  mov¬ 
ing  at  velocity  v  is“^ 

A  =  h/p  =  h/ (myv)  =  Ae^^^  (187) 

where  p  =  complex  number  momentum  whose  magnitude  is  given  by  p  =  myv  ,  and 
h  =  Planck's  constant.  From  equation  (187)  it  follows  that 


A  =  h/(mYv)  =  A^/(y8)  (188) 

0 -  9  -  9  (189) 

A  Y  V 

where  A^,  =  Compton  wavelength  given  by^® 


A  =  h/ (me)  (190) 

c 

Three  special  cases  can  be  considered. 

Case  1.  e  =  [6]^  ^  =  [cos  (29^)]^/^  (191) 

It  follows  from  equations  (178),  (179),  (180),  (188),  and  (189)  that 

\  =  A  [tan  (20  )]^^^  (192) 

c  V 

9'  =  -  tt/4  (193) 

\ 

Case  2 .  6=1  (194) 

In  this  case  it  follows  from  equations  (181),  (182)  and  (188)  that 

A  =  A  (2  sin  9  )^^^  (195) 

c  V 

9  =  _  ;r/4  _  9  /2  (196) 

A  V 

Case  3.  9  -»■  <»  (197) 


In  the  limit  3  -*■  ”  equations  (185),  (186),  (188)  and  (189)  gxve 
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(198) 


A  A 

c 


^  -  ii/2  (199) 

Therefore  as  8  increases  without  limit  the  de  Broglie  wavelength  increases  to 
a  limiting  value  of  A^,  .  It  is  assumed  that  6^  is  independent  of  velocity. 

The  total  energy  of  a  particle  located  in  asymmetric  bulk  matter  or  vacuum 
is  given  by  the  following  generalization  of  the  standard  results  of  special  rel¬ 
ativity^** 


e  =  £e^®£  =  ymc^  (200) 

where  the  complex  number  velocity  factor  is  given  by  equation  (157),  and  m  ■ 
proper  mass.  Therefore  the  total  energy  of  a  particle  has  the  same  properties 
as  Y  »  so  that 

e  =  (201) 

0=0  (202) 

e  Y 

where  the  magnitude  and  internal  phase  angle  of  the  velocity  factor  is  given 
by  equations  (165)  and  (166)  respectively.  The  kinetic  energy  of  a  particle 
in  asymmetric  bulk  matter  or  vacuum  is  given  by^** 

=  (y  -  l)mc^  (203) 

The  component  form  of  equation  (203)  is  written  as 


2 

e  cos  0  =  (y  cos  0  -  l)mc 

K.  K  y 

(204) 

2 

sin  0„  =  Y  sin  0  mc'^ 

K  K  '  Y 

(205) 

and  therefore  for  a  broken  symmetry  system 

Y  sin  9 

tan  9^,  -  a  ^  1 

K  Y  cos  9  -  1 

Y 

(206) 

2  2  1 Z'’ 

=  me  (y  -  2y  cos  0^  +  1) 

(207) 

Placing  equations  (183)  and  (184)  into  equations  (206)  and 
for  8  0 

(207)  shows  that 
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(208) 


0  20 

K  V 

1  2 

Y  "’v  (209) 

which  agrees  with  the  nonrelativist:ic  limit  obtained  directly  from  equations 
(157)  and  (203)  namely 

^  (210) 

Figures  3  and  4  show  and  9j^  in  terms  of  g  . 

Specific  values  of  the  total  energy,  kinetic  energy,  and  momentum  will 
now  be  considered  for  some  characteristic  values  of  g  . 

Case  1 .  g  0 


Y  1  +  3  /2  cos  (20  ) 

V 

(211) 

9  8^/2  sin  (29  ) 

Y  V 

(212) 

2  1  2 

E  me  +  -j  inv 

(212A) 

e  8^/2  sin  (29  ) 

£  V 

(212B) 

1  2 

2 

(212C) 

9^  29 

K  V 

(212D) 

p  'V  mv 

(212E) 

0  0 

P  V 

(212F) 

Case  2.8=  [g]  =  [cos  (29  )1^^^ 

max  Y  V 

Y  =  [sin  (29^)]"^^^ 

(213) 

0  =  :t/4  -  9 

Y  V 

(214) 
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e  =  mc^[sin  (29  )] 


0  =  tt/4  -  9 

c  V 


2r  ^ 

=  me  I 


2  cos  (./4  -  9^)  ^ 


sin  (29^)  ^sin  (20^) 


e„  £  for  small  6 
K  V 


tan  6^  . 


sin  (ir/4  -  6^) 


cos  (-n/U  -  0^)  ~  y  sin  (20^) 


0,,  ''-'0  for  small  0 
K  Y  V 


I  /2 

p/mc  =  yB  =  [cot  (20^ )] 


0  =  tt/4 

P 


(215) 

(216) 

(217) 

(218A) 

(218B) 

(218C) 

(218D) 

(218E) 


Case  3 .  B  =  1 


Y  =  (2  sin  0  ) 

V 


-1/2 


0  =  7t/4  -  0  /2 

Y  V 


2,_  .  „  ,-1/2 

=  me  (2  sin  0  )  =  cp 

V 


0  =  tt/4  -  0  /2 

£  V 


^  V  rnc  L  ^ 

K  2  sin  0 


2  cos  (7r/4  -  0/2)  ,, 

- V - ^  ^^1/2 


y2  sin  9^ 


£,,  £  for  small  0 

K  V 


(219) 

(220) 

(221) 

(222) 

(223) 

(224) 


3C0 


(224A) 


sin  (tt/4  -  6  /2) 

can  0  =  - - -  — 

cos  (tt/A  -  0  /2)  -  J  1  sin  0 

V  V  V 

0^  9^  for  small  0^  (224B) 

p/mc  =  y3  =  (2  sin  0^)  (224C) 

0^  =  t/4  +  0, /2  (224D) 

?  ^ 

Case  4.  B  -►  <» 


Y  1/8  0 

(225) 

0  7T/2  -  0 

Y  V 

(226) 

2 

£  -*■  me  /  6  0 

(227) 

0  7t/2  -  0 

£  V 

(228) 

:  -*■  mc^[l  -  cos  (r/2  -  0  )]  ->•  mc^ 

^  H  V 

(229) 

(230A) 

p/mc  =  y6  1 

(230B) 

Y  * 

(2300 

In  direct  analogy  to  the  standard  expression  for  relativistic  momentum, 
the  momentum  of  a  particle  located  in  asymmetric  bulk  matter  or  vacuum  is  writ¬ 
ten  as""* 

i  0  — — 

p  “  pe"^  P  »  myv  (231) 
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so  that 


p  =  myv 


(232) 


6  =0  +9 

P  Y  V 


(233) 


where  y  and  9^  are  given  by  equations  (165)  and  (166)  respectively.  From  equa¬ 
tions  (157),  (204)  and  (231)  it  follows  that  the  single  particle  energy  is 


-2  '>-2  2  4 

£  *  c  p  +  m  c  (234) 

which  shows  the  four-vector  status  of  e  and  p  .  Equation  (234)  has  two  compo¬ 
nent  equations 

e“  cos  (29  J  =  c^p^  cos  (29^)  +  m^c^  (235) 

sin  (26^)  -  c^p^  sin  (29p)  (236) 

Equations  (231)  through  (236)  are  equivalent  to  equations  (165),  (166),  (205) 
and  (206) .  From  equations  (235)  and  (236)  it  follows  that 


tan  (29^) 


2 


P 


p^  sin  (29  ) 

P _ 

2  2 

cos  C7.9  )  +  m  c 
P 


me 


2{-^y 

me 


eos  (20p)  +  l] 


(237) 


(238) 


It  should  be  remembered  that  for  asymmetrie  matter  or  vaeuum,  an  interaetion 
potential  and  a  gauge  potential  needs  to  be  added  to  obtain  the  total  single 
particle  energy 


£.  =  £  +  V  +  V  (239) 

1  eg 

Equation  (232)  can  be  written  as 


p/mc  *  y3  (240A) 

Combining  equations  (165)  and  (240A)  and  setting  the  derivative  of  the  momentum 
equal  to  zero  gives  the  following  value  of  6  for  maximum  momentum 
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[6] 


max  p 


(240B) 


=  [ 


cos 


(20^) 


- 1/[6] 


max  Y 


1 


for  which 

[y]  =  [cot  (20  =  [p/mc]  (2400 

max  p  V  max  y 

so  that  the  maximum  momentum  is 

[p/mc]  =  [sin  (20  )]  =  [y] 

max  V  max 

where  equation  (178)  has  been  used.  Figures  5  and  6  give  p  and  6  in  terms 
of  3  . 

The  following  arguments  show  how  numerical  values  of  9^  for  the  asymmetric 
vacuum  can  be  obtained  from  the  experimental  results  of  the  Michelson-Morley 
experiment.^®  The  generalization  of  the  standard  relativistic  velocity  addi¬ 
tion  formula  to  a  system  with  broken  internal  symmetry  is^** 


u  +  V 
I  +  uv7c^ 


(241) 


where  u  =  particle  velocity  relative  to  a  reference  frame  that  itself  is  moving 
at  a  velocity  v  ,  and  w  =  particle  velocity  relative  to  a  frame  of  reference 
from  which  the  moving  frame  has  a  velocity  v  .  Writing  the  velocities  as 


w  =  we 


j9. 


■-’w 


u  =  ue 


j^u 


V  =  ve 


(241A) 


gives  the  follov'ing  velocity  addition  formula  for  asymmetric  bulk  matter  or 
vacuum 


we 


j^w  =  A 

C  +  jD 


w 


2  2 
-  C  +  D 


w  N  D 


tan  0,,  =  B/A 


tan  =  D/C 
□ 


(242) 


(243) 


(244) 

(245) 

(246) 
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(247) 


A  = 

u 

cos 

0  +  V 

cos 

0 

(247) 

u 

V 

B  = 

u 

sin 

6  +  V 

u 

sin 

0 

V 

(248) 

C  = 

1 

+  uv 

/  2 

/c  cos 

(®u 

(249) 

D  = 

uv 

•/c^ 

sin  (0 

u 

+  0 

v) 

(250) 

It  is  easy  to  show  that 

+  2uv  cos  (0  -  6  )  (251) 

u  V 

=  1  +  u^v^/c^  +  2uv/c^  cos  (0  +  0  )  (252) 

U  V 

It  will  be  assumed  that  the  internal  phase  of  the  particle  velocity  is  indep¬ 
endent  of  the  magnitude  of  the  velocity  so  that  6^  =  0y  =  0  and 

A^  +  =  (u  +  v)^  (253) 

=  1  +  u‘'v‘'/c^  +  2uv/c^  cos  (20)  (254) 

Consider  the  case  u  =  c  and  v  =  c  ,  then  equations  (243),  (253)  and  (254)  give 

w  =  c/cos  0  (255) 

For  the  case  6=0,  the  standard  result  w  =  c  is  regained. 

In  order  to  detd-Tiine  0  ,  consider  the  case  where  u  =  c  and  v  *  speed  of  the 
earth  in  its  orbit  which  is  much  less  than  c  .  From  equations  (253)  and  (254) 
it  follows  that  for  this  case 

A“  +  3^  =  (c  +  v)“  (256a) 

C“  +  D“  =  1  +  v^/c^  +  2v/c  cos  (20)  =  (1  +  v/c)*"  -  4v/c  sin^  0  (256B) 

From  equation  (243)  it  follows  that 
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C  +  V 


(1  +  B)2  -  43  sin^  9 


(257) 


c[  1  + 


2g  sin^  6 

(1  +3)^ 


(258) 


'V/  c(l  +  28  sin  9  -  •••) 


(259) 


where  B  =  v/c  <<  1  .  The  Galilean  result  would  bew=  c+v=c(l  +  3)  instead 
of  equation  (257) ,  while  the  standard  special  relativistic  result  without  bro¬ 
ken  symmetry  would  be  w  =  c  which  can  be  regained  from  equations  (257)  through 
(259)  by  taking  9=0. 

The  details  of  the  Michel son -Mor ley  experiment  are  described  in  many  ref¬ 
erences,  and  only  the  briefest  description  will  be  given  here. °  Using 

the  Galilean  assumption  w  =  c  +  v  and  w  =  c  -  v  respectively  for  the  speed  of 
light  propagating  with  and  against  the  ether,  the  number  of  interference  fringes 
to  be  expected  in  a  Michelson  interferometer  whose  arms  are  parallel  and  perpen¬ 
dicular  to  the  earth's  motion  is  given  according  to  the  Galilean  assumption  by^° 


=  -  2L/A  8' 


(260) 


where  A  =  wavelength  of  light,  and  L  =  length  of  the  arms  of  the  interferometer. 
The  experimental  value  Ng  of  the  number  of  fringes  has  been  getting  smaller  rel¬ 
ative  to  Nq  as  more  accurate  experiments  are  performed,  and  following  Reference 
10,  Ng  is  given  by 


N  -  N 

400  G 


(261) 


On  the  other  hand  for  the  broken  symmetry  vacuum  case  equation  (259),  the  pre¬ 
dicted  number  of  interference  fringes  Ngg  is  given  by 


N  _  =  -2L/A  (26  sin^  9)^  =  -8L/A  8^  sin^  9 
Bo 


(262) 


If  it  is  assumed  that  it  follows  from  equations  (260)  through  (262)  that 


4  1 

Qi_n  0  ■"  ■  ■ 

1600 


(263) 


9  'x.  9.1' 


(264) 


Since  future  Michelson-Morley  experiments  may  find  values  of  Ng  lower  than  the 
one  used  in  this  paper  one  can  conclude  that  9  =  9 (^)  <  9°  for  the  broken  sym- 
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metry  angle  of  particle  velocity  in  the  vacuum.  The  Michelson-Morley  experiment 
not  only  shows  that  the  Galilean  velocity  addition  formula  c  ±  v  for  a  light 
source  is  invalid,  but  gives  a  positive  result  in  the  form  of  an  upper  limit  to 
the  value  of  the  velocity  asymmetry  angle  0^  . 

Alternatively,  measurements  of  the  velocity  factor  y  from  particle  accel¬ 
erator  experiments  may  eventually  produce  a  maximum  value  of  y  which  will  im¬ 
mediately  determine  the  value  of  0^  .  For  this,  particles  with  B  >  1  would 
have  to  be  observed.  If  no  such  particles  are  ever  found,  it  would  show  that 
0-^  =  0  for  the  vacuum,  and  that  the  vacuum  is  symmetric.  Experiment  can  only 
resolve  this  issue.  Experiments  to  determine  9-y  for  asymmetric  bulk  matter  may 
be  easier  because  0-^  for  bulk  matter  is  expected  to  be  larger  than  for  the 

vacuum.  Note  that  astronomical  objects  with  B  >  1  have  apparently  already  been 
observed,  and  their  explanation  in  terms  of  conventional  effects  can  be  given 
only  with  much  difficulty.^® 

Finally,  the  laws  of  motion  of  a  relativistic  particle  in  asymmetric  bulk 
matter  or  vacuum  are  considered.  Newton's  law  of  motion  is  modified  by  special 
relativity  to  give  the  following  dynamical  equation  of  motion  for  a  force  in  the 
direction  of  motion^** 


F  =  Tr(®Y^v  ) 
dt  a  a 


3  3 

my  dv  /dt  =  my  a 
3l  b.  a  a  a. 


(265) 


where  a^^  =  conventionally  calculated  acceleration,  and  yg  is  given  by  equation 
(153)  .  The  generalization  of  this  equation  to  the  case  of  particle  motion  in 
asymmetric  bulk  matter  or  vacuum  is 


d  3  3 

F  =  — =(nrYv)  =  ray  dv/dt  =  my  a 
dt 


(266) 


where  y  is  given  by  equation  (157)  and  where  t  ,  v  ,  a  ,  and  F  are  the  gauge  rotated 
time,  velocity,  acceleration  and  force  respectively.  Therefore 


a  = 


ae-" 


(267) 


F  =  Fe^^F  (268) 

Combining  equation  (162)  with  equations  (266)  through  (268)  gives  the  force  in 
the  direction  of  motion  as 


F  =  my^  a  (269) 

d_  =30  +9  (270) 

F  Y  a 

where  y  and  0^  are  given  by  equations  (165)  and  (166)  respectively. 
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5.  ELECTROMAGNETIC  WAVE  EQUATIONS.  A  direct  result  of  Maxwell's  equa¬ 
tions  is  a  set  of  wave  equations  that  describe  the  time  and  space  dependence 
of  the  electric  and  magnetic  field  vectors  in  a  material  body  or  vacuum. 

This  section  considers  the  construction  of  electromagnetic  wave  equations  for 
matter  and  radiation  with  broken  internal  symmetries.  The  standard  equation 
of  telegraphy  that  determines  the  electric  (or  magnetic)  field  in  a  conducting 
medium  is^’^° 


_  a 
7  E 
a  a 


a  a 
€  M 


a 


3t 


a  a 
y  o 


=  0 


(271) 


where  a  =  x  ,  y  and  z  ,  and  o^  =  unrenormalized  conductivity.  The  Laplacian 
operator  is  defined  as 


_2 
'  a 


(272) 


The  prescription  introduced  in  this  paper  to  handle  electromagnetism  in  matter 
or  vacuum  with  broken  internal  symmetries  is  to  use  gauge  rotated  field  vectors 
and  gauge  rotated  space  and  time  coordinates.  Applying  this  prescription  to 
equation  (271)  yields 

,  3^E  3E 

7^E  =  £U  - (273) 

3t^ 


The  complex  number  Laplacian  is  given  by 


2 


(274) 


The  first  and  second  derivative  terms  in  equation  (273)  have  already  been  eval¬ 
uated  in  Section  2.  Using  the  notation  developed  in  equations  (54)  through  (59) 
and  (65)  through  (70)  allows  equation  (273)  to  be  written  as  six  real  number 
relations  as  follows 


^  ^23^"xR’"xl)  "  ^^^2t("xR’"xI>  ^°^1 t ^^xR’ “xl> 


(275) 

(276) 

(277) 
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(278) 


I  ^26^^xR’^yR^  ^  ^^xR’ '*'  ^^xR’\l^ 


\  ^2B^^yR’^yI^  ^^^2t  ^^yR’ ^yl^  ^°^lt  ^^yR’^yl^ 


yR’  yr 


(279) 


I  ^26^^zR’^zI^  ^^'^2t^^zR’^zI^  ^°^lt ^^zR’^zI^ 


(280) 


where  the  sum  is  over  0  =  x  ,  y  and  z  . 

The  standard  equations  that  determine  the  electromagnetic  potentials  are 
written  as^*^® 


^  2  a 

„2  a  a  a  3  (fi  a.  a 

V  !|)  -  £  u  - :r  =  -  P  /£ 

.2  q 

‘^•'a 


(281) 


„  3  A 

„2,a  a  a  a  a. a 

7  A  -  e  u  - =  -  y  j 

a  a  3^2 


(282) 


The  generalization  of  these  equations  to  electromagnetic  fields  in  asymmetric 
bulk  matter  or  vacuum  is  as  follows 


-  eu  =  -  p  /c 

3t 


-7_ 

V“A  -  sy 
a 


-2  =  -  "Ja 


(283) 


(284) 


9t 

where  the  complex  number  electromagnetic  potentials  are  written  as 


'}>  = 


A  =  A  =  A  _  +  jA  ^ 

a  a  aR  al 


(285) 


(286) 


Using  the  notation  of  equations  (65)  and  (66)  allows  equation  (283)  to  be  writ¬ 
ten  as  the  following  two  relations 


(287) 
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(288) 


while  equation  (284)  can  be  written  as  the  following  six  approximations 


jx 


^  ^2B^\r’\i^  ~  ^^^2t^\R’^yI^  ^  ®jy 

I  ®jz 


(289) 

(290) 

(291) 


^26^^xR*\l^  -  "^^2t^\R*^xI^  -  ^jx  ® 


jx 


jy 


^2B^'^zR’'^zI^  ■  "^^2t^^zR’^2l>  ■  ’"^z  ®jz 


(292) 

(293) 

(294) 


Finally  the  gauge  conditions  for  an  electromagnetic  field  with  broken  in¬ 
ternal  symmetry  is  written  as^’^° 


V  •  A  +  eu  =  0 
3t 


which  can  be  written  in  terms  of  real  and  imaginary  components  as 


(295) 


(296) 


I  h6'*8R-*ai>  *  ^“ht<*R-*I>  =  0 


(297) 


where  the  sum  is  over  S  =  x  ,  y  and  z  . 

6.  VACUUM  WITH  BROKEN  INTERNAL  SYMMETRIES.  Of  special  importance  to  the 
propagation  of  electromagnetic  waves  are  the  properties  of  the  vacuum  state. 
The  vacuum  state  may  exhibit  the  same  broken  internal  symmetries  as  does  bulk 
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matter.  Consider  the  vacuum  state  to  have  a  zero  temperature  state  coupled  to 
a  thermal  state  in  such  a  way  that  the  vacuum  energy  density  and  pressure  for 
low  temperatures  are  given  by 


gCv)  ^  ^(v)  ^  g(v)^j  ^ 

o  j 


p(v)  ^  p(v)  p(v)^j 
~  o  j 


^qCv) 


(298) 


(299) 


where  and  =  vacuum  energy  density  and  pressure  respectively,  and 

=  zero  temperature  vacuum  energy  density  and  pressure  respectively,  and 
and  Pj  ~  thermal  coefficients  for  the  vacuum  energy  density  and  pressure  respec¬ 
tively.  The  vacuum  Griineisen  parameter  is  defined  by 


-(v)  ^  (v)^j9Yo^ 

'o  ^o 


(300) 


3P<'')/3I 


3E<''>/3T  L  e‘''>  0-*>  U<''> 

'  -  T=o  J  j 


where  =  VE^^^  ,  and  where  j  =  index  that  describes  the  thermal  properties 
of  the  vacuum. 

The  energy  density  E^^^  and  GrUneisen  parameter  for  the  zero  temper¬ 
ature  vacuum  are  calculated  from  the  simultaneous  solution  of  two  differential 
equations 


iW)  .  jjlli  +  ^<v)]j;<»)  .  g(v)j  .  „ 
o  »  o  o  o' 


(301) 


.-(V)p(v)  (V) 

,  O  o  ,  ^  o 

1  +  J  +  — - 7 — r  +  3n  — - 

p(v)  _  ^(v)  dn 

o  o 


(302) 


which  are  just  equations  (252)  and  (253)  of  Reference  8  with  their  right  hand 
sides  set  equal  to  zero.  A  trivial  solution  of  equations  (252)  and  (253)  of 

Reference  8  with  their  right  hand  sides  equal  to  zero  is  just  E^'^^  =  0  and 

E^^^  =  0  which  is  equivalent  to  the  unrenormalized  vacuum  E^  =  0  and  E^  =  0  . 

A  non-trivial  solution  is  obtained  by  simultaneously  solving  equations  (301) 
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and  (302).  It  is  easy  to  show  that  equation  (301)  can  be  written  as 


3n 


2  o 
dn^ 


-  3[H-Y^"^]n 


t  \ 

-(v)t  o 


\V) 


dn 


•*  rt 


(303) 


me  vacuum 


5(v)  ^  £(v)  ^  ^(v)^j 
r  or  jr 


(304) 


p(v)  ^  p(v)  ^  p(v)  j  ^ 
r  “or  jr 


(305) 


while  the  zero  temperature  radiation  Gruneisen  parameter  for  the  vacuum  is 
given  by 


=  .(V)  j^Yor 


Y'  '  =  Y'  'e 
or  or 


(306) 
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The  vacuum  radiation  equations  are  then  written  as' 


r(v)i5(v) 


-  3j[l  + 
or  o  or 


or  i 


r(v) 

3 

^(v)  o  '-'or 

j 


= 
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]  =  0 


(307) 


2  or  or  ■*  jr  jr 


(308) 


which  are  just  equations  (287)  and  (288)  of  Reference  8  with  their  right  hand 
sides  set  equal  to  zero  and  a  superscript  (v)  added  to  indicate  a  vacuum  solu¬ 
tion. 


Therefore  in  principle  asymmetric  vacuum  state  is  formally  identical  to 
the  asymmetric  bulk  matter  state.  In  fact,  the  vacuum  is  simpler  than  the 
bulk  matter  state  as  can  be  seen  by  comparing  equations  (301),  (302),  (307)  and 
(308)  with  equations  (252),  (253),  (287)  and  (288)  respectively  of  Reference  8. 
The  vacuum  is  expected  to  exhibit  a  broken  internal  symmetry  state  that  is  de- 


scribed  by  9^  and  0 


,(v) 

Y 


The  broken  symmetry  of  the  vacuum  will  impress  bro- 
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ken  symmetries  on  the  kinematic  and  dynamic  variable  of  particles  moving  in 
the  •'-•’cuum.  Similarly,  electromagnetic  waves  in  the  vacuum  are  expected  to 
possess  electric  and  magnetic  fields  and  a  spacetime  coordinate  description 
that  exhibit  internal  phase  angles. 

7.  CONCLUSION.  The  effects  of  the  broken  symmetry  of  space  and  time  on 
electromagnetism  in  matter  and  the  vacuum  is  considered,  and  Maxwell's  equations 
with  broken  internal  symmetries  are  developed.  The  Lorentz  covariance  of  these 
equations  is  assumed  to  be  valid  but  must  now  be  represented  in  the  form  of  com¬ 
plex  number  Lorentz  transformations.  The  results  of  the  Michelson-Morley  experi¬ 
ment  can  be  used  to  place  a  limit  on  the  magnitude  of  the  internal  phase  angle 
of  the  velocity  of  a  particle  moving  in  the  vacuum,  but  more  accurate  experiments 
are  required.  Experiments  conducted  in  asymmetric  bulk  matter  may  be  fruitful 
becuase  the  internal  symmetries  of  spacetime  are  larger  in  this  case  than  fcr  the 
vacuum.  The  wave  equations  and  gauge  conditions  for  electromagnetic  waves  vrith 
broken  internal  symmetries  are  easily  developed.  Finally,  the  broken  symmetry 
properties  of  the  vacuum  are  obtained  by  solving  a  set  of  coupled  differential 
equations  which  are  similar  in  form  to  the  corresponding  equations  for  asymmet¬ 
ric  bulk  matter. 
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THE  BROKEN  SYMMETRY  OF  SPACE  AND 
TIME  IN  BULK  MATTER  AND  THE  VACUUM 


Richard  A.  Weiss 

U.  S.  Army  Engineer  Waterways  Experiment  Station 
Vicksburg,  Mississippi  39180 


ABSTRACT .  Because  the  pressure  and  internal  energy  of  bulk  matter  and 
the  vacuum  are  associated  with  internal  phase  angles,  the  space  and  time  co¬ 
ordinates  and  the  kinematic  and  dynamic  variables  of  an  interacting  system 
of  particles  also  exhibit  broken  internal  symmetries.  Specifically,  in  bulk 
matter  or  the  vacuum  with  broken  internal  symmetries,  the  internal  phase  an¬ 
gles  of  the  particle  velocity,  acceleration,  and  space  and  time  coordinates 
are  related  to  the  internal  phase  angles  of  the  pressure  and  internal  energy. 
A  procedure  is  developed  for  determining  the  internal  phase  angles  of  the  ki¬ 
nematic  and  dynamic  variables  and  of  the  space  and  time  coordinates  in  terms 
of  Euler's  equations  of  motion.  Continuum  mechanics  and  elasticity  solutions 
for  bulk  matter  require  the  joint  determination  of  phase  angles  for  the  space 
and  time  coordinates  and  the  magnitude  and  internal  phase  angle  of  the  pres¬ 
sure.  Rotating  matter  with  broken  space  and  time  symmetries  is  treated,  and 
it  is  shown  that  the  conservation  of  angular  momentum  is  valid  for  such  a  sys¬ 
tem.  The  gravitational  equilibrium  configurations  of  stars  and  planets  are 
treated  for  state  equations  that  have  broken  internal  symmetries,  and  equa¬ 
tions  are  developed  that  relate  the  internal  phase  angles  of  the  space  and 
time  coordinates  to  the  internal  phase  angle  of  the  pressure.  Newtonian  grav¬ 
ity  in  matter  with  broken  internal  symmetry  is  considered  and  applications  to 
the  earth's  gravity  field  are  suggested.  These  results  will  also  affect  the 
predicted  trajectories  of  ballistic  missiles. 

1 .  INTRODUCTION .  The  fundamental  interactions  in  nature  are  formulated 
as  gauge  theories.  For  instance,  the  theory  of  gravity  is  formulated  as  a 
gauge  theory  based  on  the  Lorentz  group  S0(3,l)  ,  while  electromagnetism  is 
based  on  the  gauge  group  U(l)  The  nongravitational  forces  are  thought  to 
be  described  by  the  gauge  group  SU(3)  x  SU(2)  xU(l)  In  fact  the  Lie 

group  U(l)  and  its  real  value  analog  e"'^  have  been  shown  to  be  the  gauge 
groups  of  relativistic  thermodynamics.**  The  pressure  and  energy  density  of 
matter  described  by  relativistic  thermodynamics  are  associated  with  broken 
symmetries.  This  is  related  to  the  fact  that  the  pressure  and  energy  density 
can  be  gauge  rotated  in  such  a  way  as  to  leave  the  terms  of  the  basic  trace 
equation  of  relativistic  thermodynamics  gauge  invariant.** 

For  an  interacting  bulk  matter  system  the  broken  symmetry  of  the  state 
equation  is  vacuum  induced  and  results  from  the  solution  ■'!  a  complex  number 
trace  equation  that  relates  the  renormalized  (relativistic)  state  equation 
to  the  corresponding  ordinary  state  equation.  This  trace  equation  is  given 
by5>® 
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(1) 


U  +  T 


3V  -^(PV)_  = 
dV  ^ 


P^V 


or  equivalently  as 


(l  -  b  +  T  ^  -  bV  A)e  -  3(1  +  ,  +  V  i  -  tT  ^)p  -  <2) 

where  U,  E,  P,  y,  and  b  are  complex  number  representations  of  the  renormalized 
internal  energy,  energy  density,  pressure,  and  the  gauge  parameters,  and  where 


b  = 


(3) 


(A) 


jL  IL  ^  5P/9T 
3T  3£/3t 


(5) 


=  (T 


3T 


-  b^V 


_d_ 

3V 


+  1  - 


b^)E^ 


^6) 


T0P^/3T)^ 
(P^  -  K^) 


(7) 


The  quantities  ,  P^,  and  =  unrenormalized  values  of  the  energy  density, 
pressure,  and  bulk  modulus  respectively.  Throughout  this  paper  the  index  "a" 
will  refer  to  nonrelativistic  (unrenormalized)  calculations.  The  complex  num¬ 
ber  thermodynamic  state  functions  that  appear  in  equations  (1)  and  (2)  will  be 
written  in  terms  of  their  internal  phase  angles  as  follows 


U  =  Ue^'^ 

(8) 

—  — ,  ^  j0,- 

(9) 

E  =  U/V  =  Ee 

?  = 

(10) 

7  - 

(11) 

b  -  be^®^- 

(12) 
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where  G^j,  0p,  Syi  and  65  =  internal  phase  angles  of  the  internal  energy,  pres¬ 
sure,  Grlineisen  parameter,  and  b  gauge  parameter  respectively.  The  relativistic 
ground  state  of  the  vacuum  is  described  by  equation  (1)  or  (2)  with  their  right 
hand  sides  set  equal  to  zero.  The  vacuum  state  also  has  a  broken  symmetry  and 
in  fact  the  bulk  matter  state  is  essentially  mathematically  equivalent  to  the 
vacuum  state. 

On  account  of  the  broken  symmetry  of  the  pressure  and  energy  density  of 
bulk  matter  or  the  vacuum,  time  may  not  unfold  in  a  purely  linear  fashion  but 
may  also  rotate  in  an  internal  space.  Spatial  coordinates  in  bulk  matter  or  the 
vacuum  may  also  have  broken  internal  symmetries  that  are  associated  with  internal 
phase  angles.  The  broken  symmetries  of  space  and  time  in  bulk  matter  or  the  vac¬ 
uum  are  related  to  the  broken  symmetries  of  the  state  equations  for  these  systems. 
Thermodynamic  and  continuum  mechanics  theories  will  require  the  joint  determina¬ 
tion  of  the  internal  phase  angles  of  space  and  time  coordinates  along  with  the 
pressure  and  internal  energy  and  their  internal  phase  angles.  The  gauge  rotated 
space  and  time  coordinates  have  an  effect  on  the  equations  of  motion  of  a  system 
of  particles  and  will  affect  the  equilibrium  configurations  of  atomic  nuclei, 
planets  and  the  stars.  Note  that  it  is  the  real  parts  of  the  complex  number 
quantities  such  as  space  and  time  coordinates,  pressure,  energy,  velocity  and 
acceleration  that  are  the  measured  quantities. 

The  broken  S3rmmetry  of  space  and  time  is  related  to  the  broken  symmetry  of 
the  pressure  and  internal  energy  of  bulk  matter  or  the  vacuum  as  determined  from 
solutions  of  equation  (1).  The  right  hand  side  of  equation  (1)  is  equal  to  zero 
for  the  case  of  the  vacuum.  When  matter  is  present  the  broken  symmetry  of  space 
and  time  can  be  calculated  in  two  ways:  1 ,  at  the  macroscopic  level  through 
Euler's  equations  and  the  complex  pressure  field  for  interacting  matter  (Section 
6),  and  2,  at  the  single  particle  level  through  the  action  of  a  complex  gauge 
potential  that  is  induced  by  vacuum  effects.  For  the  vacuum  only  the  second 
method  is  possible  because  the  matter  density  is  zero,  and  the  complex  gauge 
potential  for  the  vacuum  must  be  determined. 

The  complex  gauge  potential  is  calculated  from  the  relativistic  internal 
energy  and  pressure  that  are  obtained  from  equation  (1).  This  is  done  by  cal¬ 
culating  the  renormalized  complex  valued  partition  function  which  is  defined 


Z  =  Jne  ^^dq  dp 

where  n  =  degeneracy,  6  =  1/CkT),  and  where  the  complex  number  Hamiltonian  is 
given  by 


H  =  -^  +  W  (1-^) 

2m 

where  W  =  +  Vg  ,  where  =  ordinary  external  potential,  V  =  complex  num¬ 

ber  gauge  potential  that  is  responsible  for  the  difference  between  U  and 
given  in  equation  (1).  The  connection  between  the  internal  energy  and  pressure 
and  the  partition  function  is  given  by^»^ 
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p 


(15) 


I  (  d  In  Z\ 
B  \  3V  /, 


where  U  is  given  by  equation  (1),  so  that  equations  (13)  through  (15)  can  be 
used  to  determine  the  complex  gauge  potential  Vg  in  terms  of  P  and  6p  of  the 
complex  matter  fields.  These  equations  relate  the  macroscopic  pressure  field 
given  by  equation  (1)  to  the  microscopic  gauge  potential  Vg  .  For  the  broken 
symmetry  vacuum  the  partition  function  is 


-(V)  _ 

from  which  and  can  be  obtained  using  equation  (15). 

and  p^^^  must  agree  with  the  vacuum  solutions  of  equation 
determines  V  . 

O 

The  broken  symmetry  of  the  state  functions  of  interacting 
the  vacuum  impart  a  broken  symmetry  to  the  velocity,  acceleration  and  space  and 
time  coordinates  of  particles  located  in  bulk  matter  or  the  vacuum.  Forces  ex¬ 
erted  in  bulk  matter  or  the  vacuum  will  also  exhibit  broken  internal  symmetries. 
The  aim  of  this  paper  is  to  relate  the  broken  symmetries  of  space,  time  and  the 
kinematic  and  dynamical  variables,  to  the  broken  symmetry  of  the  state  equations 
for  interacting  bulk  matter  or  the  vacuum.  The  paper  is  organized  as  follows: 
Section  2.  introduces  gauge  rotated  coordinates,  Section  3.  treats  the  geometry 
of  broken  internal  symmetry.  Section  4.  considers  the  kinematics  and  dynamics 
of  broken  symmetry  particle  systems.  Section  5.  studies  rotating  systems  with 
broken  internal  symmetry.  Section  6.  introduces  the  Euler  equations  for  bulk 
matter  with  broken  symmetry,  and  Section  7.  considers  the  equilibrium  equations 
of  stars  and  planets  whose  matter  has  internal  phase. 

2.  GAUGE  ROTATED  SPACE  AND  TIME.  In  bulk  matter  or  the  vacuum  the  thermo¬ 
dynamic  functions  such  as  pressure  and  internal  energy  exhibit  internal  phases 
(broken  symmeti^).®  This  suggests  that  space  and  time  coordinates  in  bulk  mat¬ 
ter  or  the  vacuum  may  also  possess  broken  symmetries.  Accordingly  the  space 
and  time  coordinates  of  particles  in  bulk  matter  are  written  as 


X  = 

j6x 

xe 

(16) 

y  = 

jBy 
ye  y 

(17) 

z  = 

j®z 

ze 

(18) 

t  = 

te  ^ 

(19) 

(13A) 

These  values  of 
(1),  and  this 

bulk  matter  and 


where  the  phase  angles  9v»  ®z»  ®t  manifest  the  broken  symmetry.  It 
will  be  assumed  that  in  bulk’  matter  the  phase  angles  can  be  represented  as 
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(20) 


~  0^(x,y,z,t) 
0  =0  (x,y,z,t) 

y  y 

0  =0  (x,y,z,t) 

z  z 

^  9j.(x,y,z,t) 


(21) 

(22) 

(23) 


For  the  vacuum,  coordinates  will  be  written  as 


x(v)  = 

^(v)  ^  ^(v)^j0z^^^ 

-(V)  ^ 


The  differentials 


(23A) 

(23B) 

(230 

(23D) 

of  the  space  and  time  coordinates  can  be  written  as 


80  80  86  80 

dx  =  e'^®^(dx  +  jxd6^)  =  ( 1  +  jx-~  )dx  +  jx^^  dy  +  jx dz  +  jx dt]  (24) 

80  80  80  86 

dy  =  e^  y(dy  +  jyd0y)  =  e^  y[jy-^dx+ (l  +  jy-^  )dy  +  jy^dz  +  jy-^dt]  (25) 

99  80  86  86 

dz  =  e^  ^(dz  +  jzd6^)  =  e^  jz  dx  +  jz dy  +  ( 1  +  j z  )dz  +  j z  -g-^-  dt]  (26) 

80  86  86  86 

dt  =  e^®*^(dt  +  jtd6^)  =  e^  j  t dx  +  j  t  dy  +  j t dz  +  ( 1  +  j  t )dt]  (27) 

From  equations  (24)  through  (27)  it  follows  that 


35/8x  =  Jl  +  x^(86^/8x)^ 

(28) 

8x/8y  =  x80^/8y 

(29) 

8x/8z  =  x80  /8z  e^ 

X 

(30) 

Dx/8t  =  x80  /8t  e^ 

X 

(30A) 
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3y/3x  = 

yaOy/ax  e^ 

(31) 

3y/3y  = 

/i  +  e^<ey+6y.y) 

(32) 

3y/3z  = 

(33) 

3y/3t  = 

y3ey/3teJ<®y+''/^> 

(33A) 

3z/3x  = 

(34) 

3z/3y  = 

z3e^/3yeJ<®-+"'2> 

(35) 

3z/3z  = 

yi  +  z^(3e^/3z)^' 

(36) 

9z/3t  = 

z3e^/3tej'®^'"''^> 

(36A) 

3t79x  = 

(37A) 

3t/9y  = 

t3e,/3yeJ‘®t+’/2) 

(37B) 

9t/9z  = 

t38t/3^eJ<«t+*/2) 

(37C) 

3t/9t  = 

yi  +  t2(3e,/3t)2'e^<®t+et,t) 

(37D) 

where  in  equations  (28)  through  (37)  the  following  notation  is  used 


tan  6 

x,x 


tan  6 

y»y 


tan  g 

z ,  z 


39 


(38) 

(39) 

(40) 


30 


tan  6  =  t  - — 

t,t  3t 


The  following  angles  are  also  useful 


(41) 
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tan 

'x.y 

=  X 

36 

X 

3y 

tan 

^x.z 

=  X 

39 

_ X 

3z 

tan 

^x.t 

=  X 

36 

X 

3t 

(42) 

tan 

'y.x 

=  y 

36 

_JL 

3x 

tan 

6 

y.z 

=  y 

30 

3z 

tan 

=  y 

30 

_JL 

3t 

(43) 

tan 

®z.x 

=  z 

30 

z 

3x 

tan 

'z.y 

=  z 

30 
_ z 

3y 

tan 

^z.t 

=  z 

39 

z 

3t 

(44) 

tan 

^.X 

=  t 

3x 

tan 

't.y 

=  t 

!!t 

3y 

tan 

^.Z 

=  t 

3z 

(44A) 

From  equations  (16)  through  (19)  it  also  follows  that 


^  =  e  cos  6  ^ 

3n  n.n  Sn 

where  n  =  x,  y,  z,  and  t,  and 


9 ,  =6  +6 

dn  n  n.n 


36 

tan  =  n 

n.n  3n 


cos  8 


/l  +  (n  ae^/3n)2 


(45) 


(46) 

(47) 

(48) 


The  result  in  equation  (45)  follows  from  the  fact  that  if  y.  z,  and  t  are 
constant,  then  their  respective  magnitudes  y,  z,  and  t  are  also  constant. 

The  measured  space  and  time  coordinates  are  =  x  cos  ®x  »  ^m  “  ^  ®y  * 
z^  =  z  cos  02  and  t^  =  t  cos  0^  respectively.  Space  and  time  can  be  represent¬ 
ed  by  helices  whose  spiral  lengths  are  »  x  sec  8x,x  5  ^y  ”  F  By^y  ; 

Lg  =  z  sec  63^2  Lc  =  t  sec  0t,t  •  conventional  coordinates  tg  .  x^  . 

y^  .  Zg  are  related  to  the  gauge  rotated  coordinates  by  t^  =  t^  .  x^  =  x^  . 

Va  =  Fm  and  z^  =  . 

The  following  relationships  hold  for  spherical  polar  coordinates 


j®r 

r  =  r  e"^  *■ 

(49) 

ij;  *  tp 

(50) 

^  *  ij)  e"^ 

(51) 

;  p  »  zenith  angle.  <{)  =  azimuth  angle,  and  where  0r  = 
6  (r.ij^.di.t)  .  and  0  *  6 ,  (r,ij;,<ii.t)  which  gives 

ip  9  9 

9r(r,9.<t>.t)  , 
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.0  .0  30  30  30  30 

dr  =  e-"  ’^(dr  +  jrd0^)  =  r[ ( ^  j ^ )dr  +  ^ dij- +  jr ^d(^  +  jr dt] 

d'^  =  e  ’<'(d4/  + ji|;d0^)  =  e^®'J^[jiJ;-^dr+ (1  + )dii^  + ji^-^d<j)  + ji|;-^dt] 

dip  -  e  '<’(d(t)  + j({)d0^)  =  e  <t>[j  (}) -^  dr  +  j  di|<  +  ( 1  +  j  4) )d4i  +  j  (Ji dt] 

■Q  -rt 

dt  =  e-^  *^(dt  +  jtd0  )  =  eJ®^[jt-r-^dr  +  jt-^dii;  + jt-^d(i+ (l  +  jt^)dt] 

L  or  dtp  dq)  dt 

and 


3?/3r  =yi  +  (r30j./3r)2 

(55) 

3r/3tp  =  r30^/3tp 

(56) 

3r/3<{)  =  r30^/3<|)e^^®’^‘'‘^^^^ 

(57) 

3r/3t  =  r30  /3t 
r 

(57A) 

3ij:/3r  =  'p30^/3r 

(58) 

ip/dp  =yi  +  (tp30^/3ip)2  (^'{'■^^'{'.1'^ 

(59) 

3!p/3<p  =  ip30^/3(|) 

(60) 

3lp/3t  =  tp30  /3t 

p 

(60A) 

5p/Br  »  (j)30^/3r 

(p 

(61) 

3^/3ip  =  4i30^/3ip 
<P 

(62) 

ip/dp  +  (4i30^/3<t>)2  gj  (®<t)'*’6<J),(j») 

(63) 

3!p/3t  =  'p30  /3t  e-^ 

9 

(63A) 

(52) 

(53) 

(54) 

(54A) 
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3t/9r  =  t30^ 

/3rej(9t+W2) 

(63B) 

3t/3i^  =  t30^. 

ej^®t+W2) 

(63C) 

3t/34)  =  t30j. 

/3<^ej^®t+iT/2) 

(63D) 

30 

tan  B  =  r 

r,r 

r 

3r 

(64) 

tan  B  .  =  ij; 

30 

_jL 

dip 

(65) 

tan  B,  ,  =  <j) 
9»<J' 

_ 2. 

3  2 

(66) 

where  the  connection  between  r,  0^,  ip,  6^,  (j>,  0,^)  and  x,  0x»  y.  0y  and  z,  0^ 
is  given  in  Section  3.  One  can  also  define  the  following  angles 


30 

30 

30 

tan 

^r,4/ 

=  r 

r 

tan 

8  ^ 
r,2 

tan 

^.t 

=  r 

r 

3t 

(67) 

tan 

=■  'P 

36, 

3r 

tan 

30, 

tan 

^ip,t 

=  ip 

30, 

3t 

(68) 

tan 

=  <t> 

—1 

3r 

tan 

^  9ij; 

tan 

^<p,t 

=  2 

3t 

(69) 

30 

30 

30 

tan 

^,r 

=  t 

t 

3r 

tan 

t 

~  ^  dip 

tan 

^t,(p 

=  t 

t 

(69A) 

The  derivatives  with  respect  to  the  complex  spherical  polar  coordinates  are 
now  written  in  the  same  form  as  in  equation  (45)  where  now  n  =“  t,  r,  4>  • 

The  measured  space  and  time  coordinates  are  r^,  =  r  cos  »  ’^'m  “  • 

Pm  ~  'i’  ®(j)  ~  t  cos  0|.  respectively. 

The  effects  of  the  different  types  of  forces  on  the  gauge  rotation  of  space 
and  time  depend  on  the  relative  magnitude  and  ranges  of  the  forces.  Over  small 
distances  <  cm  the  color  force  dominates,  <  10“^^  cm  the  strong  nuclear 

force  between  nucleons  dominates,  <  10“®  cm  the  electric  and  magnetic  forces  of 
electrons  and  nuclei  dominate. ^ For  ranges  >  10“®  cm  the  long  range  gravi¬ 
tational  force  dominates.  Therefore  when  equations  (24)  through  (69)  are  writ¬ 
ten,  the  origin  of  the  coordinates  is  associated  with  the  origin  of  the  forces 
involved.  Thus  for  gravity  the  origin  is  taken  to  be  the  center  of  the  planet 
or  star  in  question,  and  the  range  of  r  is  throughout  the  gravitating  body  and 
beyond  because  gravity  has  an  infinite  range.  For  nuclear  forces  the  range  of 
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—  13  —8 

r  is  r  <  10  cm,  while  for  electric  forces  in  an  atom  r  <  10  cm.  The  values 

of  0J,  and  9  depend  on  ttie  scale  at  which  the  dominant  forces  act.  It  is  the 

real  part  or  a  complex  number  coordinate  that  is  the  quantity  measured  when  a 

space  or  time  coordinate  measurement  is  made. 


3.  GEOMETRY  OF  SPACE  IN  BULK  MATTER  AND  THE  VACUUM.  The  broken  symmetry 
of  coordinates  of  particles  located  in  bulk  matter  or  the  vacuum  will  influence 
the  calculation  of  the  effects  of  the  basic  forces  that  operate  in  these  media, 
such  as  for  example  pressure  and  gravity.  This  section  considers  the  effects 
of  the  broken  symmetry  of  coordinates  on  basic  geometrical  quantities  such  as 
angles,  areas,  and  path  lengths.  For  example,  the  simple  law  of  cosines  for  a 
plane  triangle  located  in  a  medium  with  broken  symmetry  is  written  as 


cos  (p 


-2  ^  i:2 
a  +  b 


-2 

c 


2ab 


(70) 


where  a,  b  and  c  are  the  complex  number  sides  of  a  plane  triangle,  and  iji  is  the 
complex  angle  opposite  side  c  .  The  complex  number  sides  of  the  triangle  can 


be  written  as 

a  =  ae^®^ 

(71) 

b  = 

(72) 

5  -  ceJ®' 

(73) 

then 


;  =  i  Ipj  (®a-^b)  +  IkJ  (5b-®a)  _  (2ec-0a-9b) 


cos  -D  =  TT  -r  e 
Z  D 


+  TT  —  e 
2  a 


2ab 


(74) 


From  equation  (74)  it  is  clear  that  i  and  cos  p  are  complex  numbers  so  that 


^l  =  (;>e 


cos  ip  =  C  e 
P 


(75) 


(76) 


where  C,^  =  magnitude  of  cos  (p ,  and  phase  angle  associated  with  cos  <? 

In  the  same  manner  it  follows  that 


sin  p  =  S,e'^®s;j>  (77) 

9 

where  =  magnitude  of  sin  9,  and  =  phase  angle  associated  with  sin  9  • 
From  the  well  known  relation 
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(78) 


cos  5 

it  follows  that 

cos  'J)  =  cos  -})  cosh  'j)^  -  j  sin  -};  sinh  '*) 

K  X  K  X 

where  from  equation  (38) 

^  =  <{'r  +  =  <!>(cos  6^  +  j  sin  6^) 

Combining  equations  (76),  (79),  and  (80)  gives 
=  y^cos^  ((j)  cos  0^)  +  sinh^  ((|)  sin  0^) 

tan  9  =  tan  (<()  cos  0.)  tanh  (*  sin  9^) 

C(p  (p  (p 

In  a  similar  manner  from 

sin  I  =  ^  [e^^  -  e"'^^] 
it  follows  that 

sin  ^  =  sin  (Ji  cosh  <1)^+3  cos  <j>„  sinh  4) 

K  X  R  X 

and  combining  equations  (77)  ,  (80)  and  (84)  that 

S  =  /sin^  ((()  cos  9.)  +  sinh^  (<|)  sin  0^)  ^ 

9  f  (p  q) 

tan  9  =  cot  (if)  cos  9^)  tanh  (*  sin  9.) 

S9  4)  (j 

The  law  of  sines  for  a  plane  triangle  is  given  by 

a  _  b  _  c 
sin  A  sin  B  sin  C 

where 

A  =  Ae'^ 

with  similar  expressions  for  B  and  C  .  It  follows  from  equation 


(79) 

(80) 

(81) 

(82) 

(83) 

(84) 

(85) 

(86) 

(87) 

(88) 

(87)  that 
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a 


and 

9  - 

a 

and  where 


sA 


=  0. 


0 

c 


9 


sC 


=  ysin^(Acos9^^)  +  sinh^(A^iii9^ 

tan  9^^  =  cot  (A  cos  9^)  tanh  (A  sin  6^) 


(89) 


(90) 


(91) 

(92) 


with  similar  expressions  for  Sg,  S^.  6sB*  ®sC  •  should  be  noted  that 
for  spherical  triangles  equations  (89)  and  (90)  become  respectively 


S 

a 


S 

c 


and 


9  -9A  =  9u-9t,  =  9 

sa  sA  sb  sB  sc 


(93) 


(94) 


Consider  now  simple  plane  areas  located  within  a  medium  with  broken  inter- 
nal"  symmetry.  For  example,  the  area  of  a  triangle  of  sides  a,  b  and  c  with 
(j)  =  angle  between  sides  a  and  b  is  given  by 


A 


ab  sin  ^ 


Ae 


j^A 


(95) 


where  A  =  magnitude  of  area,  and  9^  =  phase  angle  of  area.  Combining  equations 
(71),  (72),  (77),  and  (88)  gives 


A 


(96) 


9=9  +  0,  +  6  ^  (97) 

A  a  b  s(|) 

where  S(()  and  9gj|j  are  given  by  equations  (85)  and  (86)  respectively.  Now  con¬ 
sider  the  area  of  a  circular  sector  of  angle  <p  which  is 
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(98) 


A=-2r  (^-sin<}>) 

=  1  -  S  ej^20r+es<j,)n 

2  4i  -’ 

then  it  follows  that 

A  cos  9^  ==  y  r^[4)  cos  (26^  +  0^)  -  cos  (20^  +  0^^)]  (99) 

A  sin  9^  =  Y  r^[(j)  sin  (20^  +  0^)  -  sin  (20^  +  0^^)]  (100) 

From  equations  (99)  and  (100)  it  follows  that 


(()  sin  (20  +0.)  -  S.  sin  (20  +  0  J 

tan  0  =  _ r  <\>  <p _ r  s4." 

A  <j)  cos  (20  +  0  )  -  S  cos  (20  +  0  ^) 

r  <p  <|>  '  r  S(Ji 

+  cos  (6^-6^^)j 

For  a  full  circle  obviously 

A  =  (103) 

9^  =  20^  (104) 

For  a  rectangle  of  sides  x  and  y  one  has 


(101) 

(102) 


A  =  xy  (105) 

9,  =0^+0  (106) 

A  X  y 

For  these  cases,  measured  area  =  A  cos  0^  . 

Now  consider  various  coordinate  systems  located  in  bulk  matter  or  vacuum 
with  broken  internal  symmetries.  For  example,  for  plane  polar  coordinates 


X  =  r  cos  (j>  =  xe 


j9 


X 


y  =  r  sin  (j)  =  ye 


j0y 


and 


-2 

X 


-2 

r 


rV^®r 


(107) 

(108) 


(109) 
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The  scalar  equivalents  of  equations  (107)  and  (108)  are 


X  =  rC^ 

<♦> 

(110) 

>■  = 

(111) 

0  =9  -  0  , 

X  r  ccfi 

(112) 

0  =9  +  0  , 

y  r  S(}) 

(113) 

The  scalar  equivalents  of  equation  (109)  are 

cos  (29^)  +  cos  (29y)  =  r^  cos  (20^)  (114) 

sin  (29^)  +  sin  (29^)  =  r^  sin  (26^)  (115) 

or  equivalently 

r^  =  x^  +  y*^  +  2x^y^  cos  [2(9  -  6  )]  (116) 

X  y 

and 

x^  sin  (29  )  +  y^  sin  (29  ) 

tan  (29  )  =  -5 - ^ ^  (117) 

X  cos  (29  )  +  y  cos  (29  ) 

X  y 

Finally,  substituting  equations  (110)  through  (113)  into  equation  (116)  gives 

Consider  now  spherical  coordinates  located  within  bulk  matter.  For  this 
system 


X  =  r  sin  ^  cos  ^ 

(119) 

y  =  r  sin  ip  sin  ^ 

(120) 

i  =  r  cos  ip 

(121) 

-2^-2^  -2  -2 

X  +  y  +  z  =  r 

(122) 

The  scalar  equivalent  equations  for  equations  (119)  through  (121)  are 
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(123) 


X  = 


rS,C  . 

ip  <p 


y  = 

z  =  rC 


ip  4, 


(124) 

(125) 


and 


(126) 

(127) 

(128) 


where  Cjj,  and  are  defined  in  equations  (81)  and  (85)  respectively,  and 
9g^  be  equations  (82)  and  (86)  respectively.  From  equation  (122)  it  follows 


0=9  + 

9  ,  ' 

0  , 

X  r 

sp 

c4> 

9=9  + 

9  ,  + 

9  , 

y  r 

sij; 

s<i) 

9=9- 

9 

z  r 

cp 

x“  cos  (29  )  +  y^  cos  (20  )  +  cos  (20  )  =  r^  cos  (29  ) 
X  y  z  r 

x^  sin  (29  )  +  y^  sin  (29  )  +  z^  sin  (29  )  =  r*"  sin  (29  ) 

X  y  2»  IT 

Equations  (129)  and  (130)  give 

4444  22r  -I 

r  =x  +y  +z  +2xy  cos  [2(6^  - 

+  2y^z^  cos  [2(9  -  9  )]  +  2x^z^  cos  [2(0  -  6  )] 

y  z  X  z 


(129) 

(130) 

(131) 


tan  (29^)  =  ~ 


x^  sin  (29  )  +  y^  sin  (29  )  +  sin  (20  ) 
_  X  _  y  z 


2  2 
x^  cos  (20  )  +  y  cos  (20  )  +  z  cos  (29  ) 
X  y  '  z 


(132) 


From  equations  (123)  through  (125)  and  equation  (131)  it  follows  that 

A  A  A  A  A  A  2,  2  r  1 

1  =  S,C^  +  STS^  +  C  +  2S,C.S^  cos  [2(9  .  +  9  ^)]  (133) 

\p  Ip  tp  p  ip  p  Ip  p  cp  sp 


+  2S^s5c^  cos  [2(9  ,+9^+91] 
p  p  p  sp  sp  cp 


+  2S^C^C^  cos  [2(9  ,  -9+9  .)] 
p  p  p  sp  cp  cp 


and 

that 
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The  last  type  of  coordinate  system  that  will  be  considered  is  the  polar 
space  coordinates  which  utilizes  direction  cosines  as  follows 


X  =  r  cos  a 
y  =  f  cos  3 
z  =  r  cos  Y 


-2  -2^-2^  -2 
r  =  X  +  y  +  z 


It  follows  from  equations  (134)  through  (136)  that 


(134) 

(135) 

(136) 

(137) 


X  =  rC 

a 

y  =  rCg 
z  =  rC^ 


9 

X 


9 

y 


9 

z 


9 

r 

9 

r 

9 

r 


ca 


c6 


cy 


where 


(138) 

(139) 

(140) 

(141) 

(142) 

(143) 


C  =  Jcos^ (a  cos  9  )  +  sinh^  (a  sin  9  )  (144) 

a  V  a  a 

tan  9  =  tan  (a  cos  9  )  tanh  (a  sin  9  )  (145) 

ca  a  a 

with  similar  expressions  for  Cg,  C^,  9<;.g,  and  9^,^  .  Equations  (129)  through 
(132)  also  hold  for  polar  space  coordinates.  The  equivalent  of  equation  (133) 
for  polar  space  coordinates  is 


1 


+ 

a 


+ 


4  4  2  2  r  T 

+  C  +  2C  C.  cos  [2(9  ^  -  9  )] 

3  Y  ot  3  c6  ca  ■' 

2C^C^  cos  [2(9  -  9  )]  +  2C^C^  cos 

3  Y  cy  c6  ct  Y 


[2(9 


cy 


-  0  )] 
ca 


(146) 


Consider  now  the  case  of  rotation  of  coordinates  in  a  plane  that  is  located 
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within  bulk  matter  or  vacuum  with  broken  symmetry.  The  values  of  the  coordi¬ 
nates  in  the  cartesian  system  that  is  rotated  through  an  angle  ^  are 


x'  =  X  cos  ^  +  y  sin  ^ 
y'  =  y  cos  ^  -  X  sin  $ 

The  component  equations  for  equation  (147)  are 


(147) 

(148) 


x'  cos  9'  =  xC,  cos  (0  -  9  ^)  +  yS,  cos  (9  +  9  ^) 
X  ({)  '  X  c(fi'^  ■’  4i  y  s<j) 

x'  sin  0'  =  xC.  sin  (9  -  9  ,)  +  yS,  sin  (0  +  9  .) 

X  ({)  X  c<|)  4i  y  S(J) 

while  the  component  equations  for  equation  (148)  are 


(149) 

(150) 


y  <p  y 

3'=yC,  sin(0  -9,> 

y  ^  y  c<i>‘ 


-  xS.  cos  (9 

(j)  X 

s4) 

(151) 

-  xS.  sin  (0 

9  X 

(152) 

it  follows  that 

COS  -  9  - 

X  y 

®  A  -  9 

C(()  s<}> 

(153) 

cos  [0  “  0  + 

X  y 

9  +  0  ] 

C(|)  S(j) 

(154) 

^  ^  ^  ■-  X  y  C(()  s<}>-‘ 

(y')^  =  y^cl  +  -  2xyC,S^  cos  [9  -  9  +9^+0^]  (15^ 

■'  41  <P  'l>4>  X  y  C(|)  s(j) 

The  coordinate  internal  phase  angles  in  the  rotated  system  are  given  by 


tan  9  = 

X 


tan  9 '  = 


sin 

CD 

+ 

^^4 

sin 

CD 

+ 

X 

0 

-e- 

cos 

('’x  - 

+ 

cos 

+ 

sin 

- 

xS  , 

<P 

sin 

+ 

COS 

- 

cos 

(«x 

(155) 


(156) 


From  equations  (153)  and  (154)  it  follows  that 

(x')^-(- (y')^  =  (x^-fy^)(C^-hS^)  +  4xyC.S.  sin  (0  -  0  )  sin  (9  ,  -f- 9  . )  (156A) 

■'  Ip  (p  <P  <t>  X  y  Ciji  S9 

which  reduces  to  the  standard  cartesian  result  when  the  internal  phase  angles 
are  set  equal  to  zero.  The  Lorentz  group  of  rotations  in  spacetime  are  con¬ 
sidered  in  an  accompanying  paper  where  Maxwell's  equations  with  broken  inter¬ 
nal  symmetry  are  considered. 
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4.  BROKEN  SYMMETRY  OF  KINEMATICAL  AND  DYNAMICAL  VARIABLES.  This  section 
considers  the  effects  of  gauge  rotated  space  and  time  on  kinematics  and  dynam¬ 
ics.  The  gauge  rotated  space  and  time  coordinates  that  were  introduced  in  Sec¬ 
tion  2  can  be  used  to  define  gauge  rotated  velocity  and  acceleration  of  parti¬ 
cles  located  within  bulk  matter  or  the  vacuum.  For  instance  the  components  of 
the  velocity  of  a  particle  are  given  by 


V 

dx 

V 

(157) 

X 

dt 

X 

V 

_  dy  _ 

V  e^®vy 

(158) 

y 

dt 

y 

V 

_  dz  _ 

i  9 

V  e-*  vz 

(159) 

z 

dt 

z 

where 


V 

X 


(160) 


V  = 

y 


(161) 


(162) 


9 

9  - 

+ 

3 

VX 

X 

t 

X,  t 

9  = 

9 

9 

+ 

3 

vy 

y 

t 

y.t 

c,t 

_ 

9 

+ 

-  .. 

vz 

"  z 

t 

"z.t 

"t,t 

where  the  internal  angular  velocities 


(163) 

(164) 

(165) 

are  given  by 


334 


(166) 


w.  =  d0  /dt 

et  t 


0).  =  d8  /dt 

0x  X 


Uq  =  d0  /dt 
0y  y 


(jj.  =  d0  /dt 
0z  z 


(167) 


and  where 


d0  /dt 

tan  0  =  X  — — 73 — 

x,t  dx/dt 


(168) 


tan  0 


y.t 


d0  /dt 

=  V  _ 

^  dy/dt 


(169) 


d0  /dt 

tan  0  =  z  ■  7T-~ 

z,t  dz/dt 


and  where  0^  is  given  by  equation  (41) 
can  also  be  written  as 


(170) 

The  internal  angular  velocities 
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0Z 


0t 
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30 
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+ 
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+ 
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dt 
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dt 
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dt 
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dt 
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dx 
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dt 
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d0 

30 

30 

dx 

30 

30 

dz 

t 

t 

+ 

t 

+ 

t 

+ 

t 

dt 

3t 

3x 

dt 

3y 

dt 

3z 

dt 

(171) 


(172) 


(173) 


(173A) 


The  conventional  special  relativistic  momentum  of  a  particle  moving  with 
a  velocity  v^  is  given  for  a  conventional  dynamical  system  by  the  following 
standard  formula^ 


a  a  a 

p  =  my  V 
X  XX 


(174) 


where  m  =  mass,  v^  =  dx^/dt^  =  dXjj,/dtjjj  =  conventionally  calculated  velocity, 
and  Y®  =  ordinary  velocity  factor  (boost)  given  by^^ 


-  [1  -  (v=/c)2r‘'^ 


(175) 
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where  c  =  light  speed  in  the  vacuum.  These  standard  formulas  are  developed 
by  considering  the  particle  to  be  attached  to  a  coordinate  system  moving  with 
velocity  v  =  v^  In  this  paper  the  generalization  to  bulk  matter  or  vacuum 

with  broken  internal  symmetries  is  made  by  considering  the  particle  to  be  at¬ 
tached  to  a  coordinate  system  moving  with  complex  velocity  v  =  v^  ,  so  that 
the  single  particle  momentum  is  written  as 

p  =  my  V  =  my  V  (176) 

X  X  X  X  X 

where 

j®YX  -2/  2n-1/2 

^x  ""  ^x®  ^  =  (1  -  )  (177) 

gives  the  complex  number  velocity  factor.  The  magnitude  and  phase  angle  of 
the  complex  number  velocity  factor  is  given  by 


y  =  [l  -  2(v  /c)^  cos  (20  )  +  (v  /c)^]  ^  (178) 

x  x  vx  X 

(v  /c)^  sin  (20  ) 

tan  (20  ) - 2 -  (179) 

^  1  -  (v  /c)  cos  (20  ) 

The  results  in  equations  (178)  and  (179)  are  obtained  as  a  simple  general¬ 
ization  of  standard  special  relativity  results  to  the  case  where  space  and 
time  have  intrinsic  broken  symmetry,  and  reduce  to  the  standard  result  in 
equation  (175)  if  the  internal  phase  angles  are  set  equal  to  zero.  Note  that 
the  measured  velocity  is  =  v^^  cos  0^  ^  v^  . 

The  magnitude  of  the  particle  velocity  is  obtained  by  noting  that  the 
complex  number  particle  velocity  is  written  as 


V  = 


ve 


(180) 


and  from  equations  (157)  through  (159)  and  equation  (180)  it  follows  that 


-2 

V 


-2^-2^  -2 
V  +  V  +  V 

X  y  z 


(181) 


or 

X  y  z 

The  component  equations  corresponding  to  equation  (182)  are 


(182) 
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cos  (26  )  =  cos  (26  )  +  cos  (29  )  +  cos  (29  )  (183) 

V  X  vx  y  vy  z  vz 


sin  (29  )  =  sin  (26  )  +  sin  (26  )  +  sin  (29  ) 

V  X  vx  y  vy  z  vz 

From  equations  (183)  and  (184)  it  follows  that 

v"^  =  v"*  +  v^  +  v^  +  2v^v^  cos  [2(6  -  9  )] 

X  y  z  xy  vx  vy 

+  2v^v^  cos  [2(6  -  6  )]  +  2v^v^  cos  [2(6  -  9  )] 

xz  vx  vz  yz  vy  vz 


v^  sin  (29  )  4-  v^  sin  (26  )  +  v^  sin  (26  ) 

OA  ^  y _ vy  2 _ vz' 

tan  (26^)  2  o  -> 

V  cos  (29  )  +  v“  cos  (29  )  +  v"  cos  (26  ) 

X  vx  y  vy  z  vz 


(184) 


(185) 


(186) 


where  v^^,  Vy,  v^,  9^,  9^,  and  9^^  given  by  equations  (160)  through  (165) 
respectively.  The  measured  velocity  =  v  cos  6^  . 

The  acceleration  components  are  written  as 


i  =  ^  =  a  e^®3^ 

X  dt  X 


a  =  — -f-  =  a  e 

y  dt  y 


VI  V  .  ^ 

i  =_^=  a  eJ®^^ 

z  dt  z 


(187) 


(188) 


(189) 


where  using  the  Eulerian  derivative  gives 


dv  3v  3v  3v  3v 

X  X.-  X.-  X.-  X 

a  =  — z-  =  — r-  +  V  -  +  V  -  +  V  - 

X  dt  3t  X  3x  y  3y  z  3z 


(190) 


X  X 


=  a^°^eJ'^xo  +  ^^^^2  ^  a^^)  J^x3 

X  X  X  X 
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y  y  y  y 


dv 

3v 

3v 

3v 

z 

=  — -  +  V 

— ~  +  V 

-#  +  V 

dt 

at 

X  3x  y 

3y 

=  i(°>  +  i(')  +  i<2) 

8v 


=  a^°^ej^zo  + 


where 


where 


tan  B 


36  /3t 

vx,t  3v  /3t 

X 


tan  B 


30  /3x 

vx 

s  Y 

vx,x  X  3v^/3x 


(211) 


tan  B  =  V 


36  /3y 

vx 


■'vx.y  X  3v^/3y 


tan  B  =  V 


36  /3z 

vx 


vx,z  X  3v^/3z 


(212) 


30  /3t 

vy,t  ’y  3v  /3t 

y 


tan  B  =  v 


tan  B  =  v 


3  0  /3x 

J2L 


vy,x  y  3v  /3x 

y 


(213) 


:an  B  =  v 


30  /3y 

J2L 


vy.y  y  3Vy/3y 


tan  B  =  V 


36  /3z 

J2L 


vy,z  y  3v  /3z 

y 


(214) 


tan  B  *  V 


30  /3t 

vz 


vz,t  z  3v  /3t 
z 


tan  B 


30  /3x 

vz 

vz,x  z  3v  /3x 
z 


(215) 


tan  B  =  V 


30  /3y 

vz 


vz,y  z  3v  /3y 
z 


tan  B  =  v 


30  /3z 

vz 


vz,z  z  3v  /3z 
z 


(216) 


Equations  (199)  through  (210)  can  be  further  reduced  by  using  equations  (163) 
through  (165). 

Combining  equations  (187)  through  (189)  with  (190)  through  (192)  gives 

a  cos  0  =  a^°^  cos  ^  +  a^^^  cos  it  ,  +  a^^^  cos  ~  +  a^^^  cos  ^  _  (217) 

X  ax  X  xo  X  xl  X  x2  x  x3 


a  sin  0  =  a^°^  sin  if)  +  a^^^  sin  ip  ,  +  a^^^  sin  <1^0+  a^^^  sin  tj.'  _ 

X  ax  X  ’^xo  X  ’^xl  X  ’^x2  x  x3 


(218) 


=  a^°^  cos  +  a^^^ 
ay  y  "^yo  y 


^(2) 

Wl  ■  y 


f  3'\ 

a  cos  0  =  a'''^''  cos  ip  +  a'"'’"'  cos  ip  ,  +  a''"''  cos  ij;  ^  +  a^  cos  ip  _  (219) 

tr  I  y  V3 


a  sin  6  =  a^°^  sin  ijj  +  a^^^  sin  ^  ,  +  a^"^  sin  ij;  „  +  a^^^  sin  \!j  _  (220) 

y  ay  y  yo  y  yl  y  y2  y  y3 
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a  cos  0  =  cos  <p  +  cos  ip  ,  +  a^^^  cos  ip  „  +  cos  4i  ~ 

z  az  z  zo  z  zl  z  z2  z  z3 

a  sin  0  =  a^°^  sin  ip  +  a^^^  sin  <p  ,  +  a^^^  sin  ip  ^  +  a^^^  sin  ip  _ 

z  az  z  zo  z  zl  z  z2  z  z3 

These  equations  can  be  used  to  determine  a^.  Sy,  a^,  0ay»  and  632  • 

the  special  case  when  there  is  no  spatial  variation  of  the  velocity  field 
follows  that 
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(226) 

(227) 

(228) 


The  complex  magnitude  of  the  particle  acceleration  is  written  as 

j  ®a 

a  =  ae 

and  from  equations  (187)  through  (189)  it  follows  that 

-2  -2^-2^  -2 
a  =  a  +  a  +  a 
X  y  z 

The  component  equations  corresponding  to  equation  (230)  are 

a^  cos  (20  )  =  a^  cos  (20  )  +  a^  cos  (20  )  +  a^  cos  (20  ) 

a  x  ax  y  ay  z  az 

a"  sin  (20  )  =  a^  sin  (20  )  +  a^  sin  (20  )  +  a^  sin  (20  ) 

a  x  ax  y  ay  z  az 

It  follows  from  equations  (231)  and  (232)  that 

a^  =  a"*  +  a^  +  a^  +  2a^a^  cos  [2(0  -  9_„)] 

X  y  z  X  y  ax  ay 


(229) 


(230) 


(231) 

(232) 

(233) 


+  2a^a^  cos  [2(0  -  9  )]  +  2a^a^  cos  [2(9  -  0_,)] 

X  z  ^  ax  az  y  z  ay  az 


(221) 

(222) 

For 

it 
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and 


tan  (29^)  =  ^ 


sin  (29  )  +  a^  sin  (29  )  +  a^  sin  (29  ) 

X _ a^c _ y  ay _ z _  az 


a‘'  cos  (29  )  +  a^  cos  (29  )  +  cos  (29  ) 

X  ax  y  ay  z  az 


(234) 


where  a^,  ay,  a^,  and  ®ay‘  ®az  given  by  equations  (217)  through  (222). 

The  measured  acceleration  =  a  cos  9^  . 

For  a  particle  moving  in  bulk  matter  or  vacuum  with  broken  symmetry  and  not 
acted  upon  by  forces,  the  momentum  is  constant  and  equation  (176)  gives 


my  V  =  C 
XX  vx 


9  +9  =  C 

vx  yx  vx 


where  C^x  and  are  constants  of  the  motion, 
and  (236)  give 


(235) 

(236) 

Equations  (160),  (163),  (235), 


2  2r/dx 

my  ' '  — 


di 


dx\ 

dt  j 


^22 
+  X  (!)„ 

9x 


2  2 

'  ^  ^  "9t 


=  C 


vx 


(237) 


9  +9  -9^+6 

yx  X  t  x,t  t,t  vx 


(238) 


Equation  (237)  shows  that  there  is  a  transfer  of  energy  between  the  linear 
motion  and  the  internal  phase  motion.  Equation  (238)  shows  that  there  is  also 
a  transfer  between  9^  and  9j-  because  equation  (238)  can  be  rewritten  as 


9  +  9  +  tan 

yx  X 


-1 


d9  /dt 

X 


("-d^j-  ®t  - 


tan  ^  (t  d9^/dt)  =  C  (239) 

t  vx 


The  nonrelativistic  equations  of  motion  of  a  particle  moving  in  a  poten¬ 
tial  field  are  given  by^^ 


••a 

my  =  mv 


a  a 

’  =  ma  = 

X  X 

-3W^/9x^ 

(240) 

a  a 

'  =  ma  = 

y  y 

-3W^/3y^ 

(241) 

a  a 

=  ma  = 

-3W^/3z^ 

(242) 

The  corresponding  relativistic  equations  of  motion  for  particles  in  a  medium 
not  having  broken  internal  symmetry  are^^ 
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(243) 


m(Y^)^a^  “  -3W^/3x^ 


(244) 

(245) 


nryV  =  -3W^/3y^ 

nry^a^  =  -3W^/3z^ 
X  z 


where  the  standard  velocity  factor  is  given  by  equation  (175).  Consider  now  a 
conservative  force  acting  on  a  particle  located  in  bulk  matter  or  vacuum  with 
broken  internal  symmetries  in  the  space  and  time  coordinates.  If  the  complex 
number  potential  is  written  as 


W  = 
then  the 

ma 

X 


ma 


y 


We^®W 

nonrelativistic  equations  of  motion  are  written  as 
=  ma^e'^^^  =  -3W/3x 

=  ma^e'^^^y  =  -3W/3y 


-J0az 

ma  e 
z 


-3W/3z 


(246) 


(247) 

(248) 

(249) 


where  ix»  iy,  and  a.^  are  given  by  equations  (190)  through  (192);  and  a^*  ay, 
az»  9ax»  ®ay»  ®az  obtained  from  equations  (217)  through  (222).  If  the 
theory  of  special  relativity  is  considered  in  conjunction  with  broken  internal 
symmetry,  equations  (247)  through  (249)  become 


•3-  3  j  (0ax+3e^) 

'  a  =  mY  a  e  i*  = 

-3W/3x 

(250) 

XX  XX 

I  =  mY  a  e3 (^ay+^Yx)  = 

X  y  X  y 

-3W/3y 

(251) 

■  i  =  mY  a  e^  ^^az+^Yx)  » 

X  Z  X  z 

-3W/3Z 

(252) 

where  Yx  is  given  by  equation  (177),  Yx  s^d  Syx  are  given  by  equations  (178) 
and  (179)  respectively,  and  where  the  particle  is  moving  instantaneously  along 
the  X  axis  with  velocity  v^  .  Equations  (250)  through  (252)  are  simple  gener¬ 
alizations  of  the  standard  special  relativistic  inertia  terms  to  the  case  of 
particle  motion  in  media  with  broken  internal  symmetries. 

The  derivatives  of  the  broken  symmetry  potential  can  be  written  as 


343 


(253) 
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where 
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3ey9y 
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aeysz 

3W/3z 


(258) 


and  where  6x,x5  ^y,y  ^z,z  given  by  equations  (38)  through  (40)  respec¬ 
tively.  Combining ’equations  (187)  through  (198)  with  equations  (250)  through 
(255)  gives  the  following  relativistic  equations  of  motion  for  a  particle  lo¬ 
cated  in  bulk  matter  or  vacuum  with  broken  internal  symmetries. 


(259) 
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az  YX 
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W,z  z  z,z 


(264) 


where  y^  and  ®YX  are  given  by  equations  (178)  and  (179)  respectively,  and  where 
a^,  ay,  a^  and  63x»  ^ay*  ®az  ^re  obtained  from  equations  (217)  through  (222). 


A  useful  form  of  the  nonrelativistic  equations  of  motion  for  a  particle 
located  in  asymmetric  matter  is  obtained  from  equation  (247)  as  follows 


m  ^  =  -  3W/3x 
dt"^ 


(264A) 


where  x  is  a  complex  number  given  by  equation  (16)  whose  real  and  imaginary 
components  are  written  as  x^  =  x  cos  9  and  x^.  *  x  sin  9  .  Then  it  follows  that 

K  X  i.  X 
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dt 
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(264D) 


^2  ^2 

h.t  TT  +  '°=  7T^ 

at  at 


(264H) 


?  ,2 

cos  ^  sin  f0  -  20  ,.  )  — ^ 

C,t  X  dt  ,^2 

dt 


The  derivative  of  the  complex  potential  can  be  written  by  noting  that 

W„  =  W  cos  9„  and  W_  =  W  sin  9,,  so  that 
K  W  1  W 
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(2641) 
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3x  x,x  3x 
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X  x,x 


(264J) 


3W  3W^ 

R,  (W  ,W  )  =  cos  6  (cos  9,  -r —  +  sin  0  ) 

lx  R  I  x,x  dx  dx  dx  3x 


(264K) 


3W 

'O  cos  cos  (0^  -  ^ 


3W  3W 

I,  (W  ,W  )  =  cos  6  (-  sin  0,  — —  +  cos  0,  -r — ) 

lx  R  I'  x,x  '  dx  3x  dx  3x  ' 


(264L) 


3W 

X-  cos  6  sin  (0  -  9.  ) 

x,x  W  dx  3x 


Newton's  dynamical  equation  (264a)  can  now  be  written  in  the  following 
approximate  forms 


“®2d’‘R-’‘l>  '  -  '‘u<”r’”i> 


(264M) 


(264N) 


where  R2t  ^nd  l2t  are  given  by  equations  (264G)  and  (264H)  respectively,  and 
Rlx  and  are  given  by  equations  (264K)  and  (264L)  respectively.  A  further 
approximation  for  relations  (264M)  and  (264N)  yields 


2 

2  d  X 

mcos  6  cos  (0  -  29  ,  )  — ^  %  -  cos  0  cos  (9  -  9,  ) 

t,t  X  dt  ,  2  x,x  W  dx  3x 

dt 


(2640) 


m  cos  6^  sin  (0 

t ,  t  X 


2 

26  .  )  -  cos  B  sin  (0  “  6  .  ) 

dt  x,x  W  dx  3x 


(264P) 


which  gives 
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(264Q) 


e  -  20,^  9,,  -  0_, 

X  dt  W  dx 


m  cos 


t,t 


d^x 

dt^ 


-  cos 


3 

X,X  0X 


(264R) 


which  are  the  approximate  equations  of  motion  for  a  particle  in  a  potential 
field  that  is  located  in  asymmetric  bulk  matter  or  vacuum.  For  a  nonrelativ- 
istic  system  the  measured  acceleration  is  given  by 


a  =  a  cos  0 
xm  X  ax 


2  d^x 

cos  3  cos  0  — r 

t.t  ax  ^^2 


(264S) 


while  the  conventionally  calculated  acceleration  is  given  by 

cos  0 


d^x 

a  a 

■  ,  2 


d^x 


dt 


m 


dt 

m 


d^x 


2  n  j  ^ 

COS  9^  dt 


(264T) 


and  therefore  a^  ^  a^  .  Relations  (264Q)  and  (264R)  can  be  applied  to  many 
specific  d3mamical  systems  that  are  located  in  an  asymmetric  medium.  For  in¬ 
stance  the  vibration  of  molecules  and  atoms  located  in  matter  are  expected  to 
be  described  by  these  equations. 


Jy  »  Z  > 

0^,  and  0J. 


There  are  twenty  three  unknown  variables  that  are  needed  to  describe  a  par¬ 
ticle  in  bulk  matter  or  vacuum  with  broken  internal  symmetries:  x,  y,  0-^ 

®z’  "x’  ®vx*  ®vy  ’  ®vz’  S’  ®ax’  S’  ®ay»  ^z’  ®az»  ®P> 

The  magnitude  of  the  time  t  is  taken  to  be  a  totally  independent  parameter.  Twen¬ 
ty  two  equations  have  been  derived  thus  far  in  an  attempt  to  determine  the  twenty 
three  unknowns:  two  ground  state  relativistic  equations  (1),  two  equations  for 
the  ground  state  Griineisen  parameter  (5),  the  six  kinematic  velocity  equations 
(160)  through  (165),  the  six  kinematic  acceleration  equations  (217)  through 
(222)  ,  and  the  six  dynamical  equations  (259)  through  (264) .  By  means  of  the 
twenty  two  equations  the  kinematical  and  dynamical  variables  have  been  expressed 
in  terms  of  the  potential  components  W  and  9y  .  But  the  single  particle  poten¬ 
tial  parameters  W  and  9y  are  related  through  a  gauge  potential  to  the  complex 
macroscopic  state  equation  variables  P,  9p,  y,  and  0y  that  are  determined  from 
equations  (1)  and  (5).  This  connection  is  made  through  a  partition  function  as 
shown  in  equations  (13)  through  (15)  which  determine  the  gauge  potential.  But 
through  equation  (1),  P,  0p,  y,  and  9^  are  related  to  the  unrenormalized  pres¬ 
sure  and  Griineisen  function  P^  and  respectively.  Therefore  it  should  be  pos¬ 
sible  to  express  all  of  the  kinematical  and  dynamical  variables  in  terms  of  t, 

P^,  and  the  unrenormalized  potential  V®  . 

Clearly  one  additional  equation  is  necessary  in  order  to  have  a  total  of 
twenty  three  equations  that  are  needed  to  determine  the  twenty  three  unknown 
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variables.  The  needed  equation  is  given  by  the  following  complex  number  con¬ 
tinuity  equation  for  broken  symmetry  matter 


+  V-(pv)  =  0  (265) 

where  p  =  mass  density.  Equation  (265)  can  be  rewritten  as  two  real  number 
equations  as  follows 


cos 

+ 

G 

cos 

+ 

G 

cos 

+ 

G 

cos 

<() 

=  0 

(266) 

t 

X 

X 

y 

y 

z 

z 

sin 

+ 

G 

sin 

+ 

G 

sin 

<i> 

+ 

G 

sin 

4> 

=  0 

(267) 

t 

X 

X 

y 

y 

z 

z 

where 


(268A) 


(268B) 


(268C) 


(268D) 


(268E) 


(>=0  +6  -9-B  (268F) 

X  vx  pvx,x  X  x,x 


349 


(f>=0  +s  -0-6 

y  vy  pvy.y  y  y,y 


(268G) 


(|)=0  +6  -0-6 
z  vz  pvz,z  z  z,z 


(268H) 


where 


tan 


6 


pva  ,a 


30 

vg 

3a 


3—  (pv  ) 
3a  a 


(269) 


where  a  =  x,  y,  z.  Therefore  the  complex  number  equation  (265)  has  two  real 
components  that  can  be  used  along  with  the  previously  elaborated  twenty  two 
equations  to  obtain  p  and  0^  .  There  are  now  a  total  of  twenty  four  equations 
and  twenty  four  unknovm  quantities  which  can  be  determined  to  give 


X  =  x(t.P^,Y^.V^)  0^ 

y  =  y(t,P^,Y^,V^)  0y 

z  =  z(t,P^,Y^,V^)  0^ 


0  (t.P^.Y'^.V^) 

X 

(270A) 

0y(t,P^.Y^,V^) 

(270B) 

0  (t,P^,Y^,V®) 

(270C) 

V 

=  V  (t.P^.Y^.v"*) 

0 

X 

X 

vx 

V 

=  V  (t.P^.Y^.V^^) 

9 

y 

y  ^  t  f 

vy 

V 

=  V  (t,P^,Y^,v'') 

0 

z 

z 

vz 

a 

=  a^(t,P‘",Y  .V'") 

9 

X 

ax 

a 

=  a  (t.P^.Y^'.V^) 

9 

y 

y 

ay 

^  _a  a  .a^ 

a 

=  a  (t,P  ,Y  ,V  ) 

9 

2 

2 

az 

9,,^(t,P®,Y^,V^)  (271A) 

0^(t,P^,Y^,V^)  (271B) 

0v2(t,P^,Y^,V^)  (271C) 

93^(t,P^,Y^,V^)  (272A) 

9ay(t.P^,Y^,V^)  (272B) 

9^  (t.P^.Y^.V^)  (2720 

az 


P  =  P(P^,Y^.V^)  0p  =  0p(p'',Y^,V^)  (273A) 

Y  =  y(P^,Y^,V^)  0=0  (P^.Y^.V^)  (273B) 

Y  Y 
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(274A) 


p  =  p(t,P^.Y^.V^) 


(274B) 


where  time  is  treated  as  an  independent  variable,  and  where  V 
potential. 


unrenormalized 


The  first  integral  of  equations  (247)  through  (249)  is 


■jmv^  +  W  =  E 


(275) 


where  E  =  complex  number  total  energy,  and  where  v  is  given  by  equation  (181) 
The  two  scalar  component  equations  corresponding  to  equation  (275)  are 


1  2 

—  mv  cos  (20  )  +  W  cos  6„  =  E  cos  9_. 
2  V  W  E 


(276) 


1  2 

Trmv  sin  (29  )  +  W  sin  9„  =  E  sin  0„ 
2  V  W  E 


(277) 


where  v  and  9^  are  given  by  equations  (185)  and  (186)  respectively.  The  corre¬ 
sponding  first  integral  of  the  relativistic  equations  of  motion  (259)  through 
(261)  is“ 


(y  -  l)mc  +  W  =  E 

X 


(278) 


where  the  particle  is  instantaneously  moving  along  the  x  axis  and  Yx  Is  given 
by  equation  (177)  The  component  form  of  equation  (178)  is  written  as 


(y  cos  6^  -  l)mc  +  W  cos  9  =  E  cus  9 

X  yx  w  u 


(279) 


Y  sin  6  me  +  W  sin  0.,  =  E  sin 
X  YX  W  E 


(280) 


The  energy  equations  determine  the  magnitudes  of  coordinates  and  velocities. 

5.  ROTATING  SYSTEMS  IN  BULK  MATTER  AND  THE  VACUUM.  In  Section  3  it  was 
shown  that  the  angle  between  two  lines  located  in  bulk  matter  or  the  vacuum 
with  broken  internal  symmetry  is  expected  to  have  an  internal  phase  angle.  This 
suggests  that  angular  velocity  also  has  a  broken  internal  symmetry.  Accordingly 
the  angular  speed  associated  with  a  complex  number  geometrical  angle  given  by 


“  <pe 


(281) 
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is  written  as 


=  we 


dtj) 

dt 


ej(00-et) 


d(J)  +  j<l>d9^  \ 
dt  +  Jtd0|.  / 


and 


6 

w 


9 

t 


.t 


where 


(282) 


(283) 


(284) 


tan 


de  7dt 

3  =  A 

^),t  dip/dt 


(285) 


The  angular  speed  associated  with  the  internal  phase  angle  of  the  geometrical 
phase  angle  is  written  as 


"95 


dt 


+ 


dt 


3.{)  dt 


(286) 


and  the  angular  speed  of  the  internal  phase  angle  of  the  time  coordinate 
is  given  in  equation  (173A)  or  equivalently  by 


de  56 
_ t _ t 

'"^et  dt  5t 


dr 

dt 
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t  d4) 
5  ((>  dt 


and  finally  where 


(286A) 
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d(l> 

uJ  “ 

({>  dt 


(287) 


is  the  ordinary  angular  speed  associated  with  the  magnitude  ij)  of  the  geomet¬ 
rical  angle.  Equation  (283)  is  the  general  expression  for  angular  speed  with¬ 
in  bulk  matter  or  vacuum  with  broken  internal  symmetries.  The  measured  angu¬ 
lar  speed  is  given  by  oj  cos  9^  . 

For  short  periods  of  time  equation  (283)  shows  that 

0)  (288) 

while  for  long  periods  of  time 


/  \  0<t> 

0)  <UJ^>  - - 

♦  “et 


(289) 


where 

t 

<“<>>  "  i  /  ^290) 

o 

In  fact  equation  (283)  shows  that  for  a  small  t 


OJ  =  <jJ 

9 


(291) 


2  2 


<<  1 

2 

and  t  oj 

2 

'9t  1  » 

while 

for  large 

<“<(>> 

Ti  + 

2 

i  "9 

1 

\t-2  +  ...1 

2  1 

1  ,  s2  2 

2 

It  +  . .  .J 

et 

"ef 

/  J 

(292) 


Consider  now  the  velocity  of  a  particle  in  bulk  matter  or  vacuum  that  has 
a  radial  and  a  transverse  component.  The  radial  component  is  given  by 


v  =  V  e 
r  r 


3®vr  =  ^  =  pj(®r"Qt) 


dt 


I  dr  +  jrde^^ 
\  dt  +  jtde  / 


(293) 
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so  that 


,  2  2 
^  "  ‘"et 


0  =0+B  -0-B 

vr  r  r,t  t  t,t 


where 


d0  /dt 

tan  S  ^  =  r  .  . 

r,t  dr/dt 


30  30  30 

/j  .  r  dr  .  r  d4> 

0)  =  d0  /dt  =  “r-- — H  “T —  "j—  +  ^  -rr 

0r  r  3t  3r  dt  3i(>  dt 

The  transverse  component  of  velocity  is  given  by 


V  =  V  =  r  —  =  ro) 

®  9  dt 


Combining  equations  (282)  and  (298)  gives 


V,  =  ru) 

<? 


0^  =  9  +0^  +  8^  -0  -  8  ^  =  0  +  9 

v({)  r  (fi  (j),t  t  t,t  r  (jj 

where  to  is  given  by  equation  (283)  .  The  magnitude  of  the  vector 
radial  and  transverse  velocities  is  given  by 

-2  -2^-2  2  2j9v 

V  =  v  +v,  =  ve  ’'^ 

r  (> 

which  has  the  following  scalar  components 

v^  cos  (29  )  =  v^  cos  (29  )  +  cos  (20  ) 

V  r  vr  9  vip 

v^  sin  (20  )  =  v^  sin  (29  )  +  v^  sin  (20  ) 

V  r  vr  9  v9 


Fret,  cquaticr.  (302)  and  (303)  it  follows  that 


(294) 

(295) 

(296) 

(297) 

(298) 

(299) 

(300) 

sum  of  the 

(301) 

(302) 

(303) 
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(304) 


^  k  b,  1  1  r 

V  =  V  +  +  2v  V,  COS  [2(6  -  6  ^)j 

r  ij)  '  vr  vcj)'^-’ 


tan  (26^) 


sin  (26  )  +  sin  (26  ,) 

_r _  vr  (t) _ ^  v(|) 

V  cos  (26  )  +  V,  cos  (26  ,) 

r  '  vr  (|)  V(t) 


(305) 


where 


6  -8=6  _0_e 

vr  v(})  r,t  (j)  <t>,t 


(306) 


The  measured  speed  =  v  cos  6^  . 

For  ordinary  matter  rotating  about  a  center  of  force,  the  radial  and 
transverse  accelerations  are  written  as^^ 


a 

a  = 
r 


Jr 

dj 

a 

(307) 

d^(t. 

dr 

a 

(308) 

^  dt^ 

-77—  0) 
dt  a 
a 

^  particle  that  is  orbiting  about  a  center  of  force  located 
within  bulk  matter  or  the  vacuum  with  broken  internal  symmetry  will  have  the 
following  radial  and  transverse  components 


a  *  =  a  e^^ar 

dt^ 


(309) 


dt^  dt  <l> 


(310) 


Each  of  the  four  terms  in  equations  (309)  and  (310)  can  be  evaluated  in  terms 
of  previously  calculated  quantities. 

The  linear  radial  acceleration  term  in  equation  (309)  is  given  by 


,2-  dv 

d  r  r  _  -  jearr 

— r  =  -  =s  a  =  a  e'' 

dE^  dt  rr  rr 


(311) 


where  the  time  derivative  of  the  radial  velocity  is  given  bv  the  Eulerian 
derivative  as  follows 
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(312) 


(313A) 


(313B) 


(313C) 


ip  =9  +e  -9-6 
ro  vr  vr,t  t  t,t 


(313D) 


■p  =  2e  -9+6  -6 

rl  vr  r  vr,r  r,r 


(313E) 


=  9  ^  +  9 
r2  vcj)  vr 


-9-9+6 

r  ^  vr ,  (j> ,  <|) 


(313F) 


where  B^^^  ,  B^.  ^  .  and  B||,^^  are  given  by  equations  (41),  (64),  and  (66)  re¬ 
spectively,  and  v^  and  9^^  are  given  by  equations  (294)  and  (295)  respectively, 
and  v^  and  9^^  are  given  by  equations  (299)  and  (300)  respectively,  and  vrhere 
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39  /3t 

^vr,t  3v^/3t 


(314A) 


39  /3r 

tan  3  =  V  .  — 

vr,r  r  3v^/3r 


(3148) 


39  /3<^ 

^  n  vr 

tan  6  ^  =  V  -T - 7:— 

vr,ip  r  3v  /34) 


From  equations  (331)  and  (312)  it  follows  that 


(3140 


)  .  a'°> 

arr  rr 


a  sin  9  =  a 

rr  arr  rr 


cos 

cos  l];  , 

+  a(2) 

cos  tp  n 

(315) 

ro 

rr 

rl 

rr 

r2 

sin  <p 

ro 

rr 

sin  Tp  , 
rl 

+  a(2) 

rr 

sin  \p^2 

(316) 

from  which  aj.j.  and  9^^^,  can  be  obtained  immediately.  For  the  special  case 
where  there  is  no  spatial  variation  of  the  velocity  field  the  acceleration 
equation  become 


(o) 

a  =  a 
rr  rr 


9  =1); 

arr  ro 


(317) 

(318) 


The  centrifugal  radial  acceleration  term  in  equation  (309)  is  written  as 

rw^  =  roj^e^  =  g  e^®acen  (319) 


TU)  =  ro)  e 
so  that 

2 

a  =  ruj 
cen 


a  e 
cen 


(320) 


9  =9+29  =9+  2(9^  +  -  9  -  B  ) 

acen  r  cj  r  <ti  4'*t  t  t,t 


(321) 


where  w  and  9^  are  given  by  equations  (283)  and  (284)  respectively. 

The  first  term  in  the  angular  acceleration  given  by  equation  (310)  is 
written  as 
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-  d  cjj  -  dco 

r  - —  =  r  — 

dt^  ciE 


4)(f) 


(322) 


where  5  is  given  by  equation  (282).  The  time  derivative  is  taken  to  be  an 
Eulerian  derivative  (which  accounts  for  differential  rotation)  as  follows 
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,  -  9(jj  ,  9u)  \ 

+  V  —  +  - z-  ) 

^  9r  r  9(J)  / 


(323) 


^  (o)  ^  -(1)  -(2) 

(^<p  rptp  (p(p 
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where 


(324A) 


(324B) 


(3240 


ip^  =  e  +  0  +  6 

4)0  r  (j  (i),t  t  t,t 


+0  +  B  -B 

vr  0)  w,r  r,r 


4i2  V<p  !jJ  t?  9i 


(324D) 

(324E) 

(324F) 
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where 


ae  /at 

.  OJ 

tan  B  ^  ^ 

(i),t  dat/dt 


(325A) 


ae  /ar 

tan  3  =  oj  „  — 

a),r  aw/ar 


(325B) 


ae  /3<t> 

.  OJ 

tan  3  .  = 

u),9  aa)/a<p 


From  equations  (322)  and  (323)  it  follows  that 


(325C) 


(o) 


(1) 


(2) 


Sin  e_^^^  =  a^^  sin  i|;^^  +  a_,^  sin  +  a^^  sin  <!> 


a<^<|)  (p<ti 


4)0  4*4’ 


4>2 


(327) 


from  which  a^^  and  6^^^  can  be  immediately  obtained.  For  the  special  case 
where  there  is  no  spatial  variation  of  the  angular  velocity  (uniform  rotation) 
it  follows  that 


4,4, 

=  a<°^ 

(328) 

a4)4) 

“  ^o 

(329) 

Finally  the  Coriolis  term  in  equation  (310)  is  written  as 


2ijjv  =  a  e^^nc  a  2(i)v 
r  c  r 


(330) 


and  therefore  for  the  Coriolis  acceleration 


a  =  2ajv 
c  r 


(331) 


e 

ac 


e  =  e  +  3  + 

vr  r  r , t 


e  +  3 .  -  2(e  +  3.  J 

<P  t  tjt 


(332) 


where  Vj.  is  given  by  equation  (294),  and  9^,  and  9^^  by  equations  (284)  and 
(295)  respectively.  Combining  equations  (284),  (324D),  (329),  and  (332)  shows 
that  for  uniform  rotation 
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9  -  3  +3 

ac  r ,  t  u) ,  t 


(333) 
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All  of  the  terms  in  equations  (309)  and  (310)  have  been  evaluated  and 
these  equations  can  be  written  ns 


a  =  a  e 
r  r 


a ,  =  a ,e 


;3®ar  =  3  gj9arr  _  ^  gj93cen 

rr  cen 

(334) 

;3®a4,  =  3  ej0a99  +  3  ej93c 

pp  c 

(335) 

The  magnitudes  and  internal  phase  angles  of  the  radial  and  transverse  components 
of  the  acceleration  a^.,  6ar>  yet  to  be  calculated.  This  is  done 
using  equations  (334)  and  (335).  From  equation  (334)  it  follows  that 


a  cos  9  =  a  cos  9  -  a  cos  9 

r  ar  rr  arr  cen  acen 


a  sin  9  =a  sin  9  -a  sin  9 
r  ar  rr  arr  cen  acen 


(336) 

(337) 


and 


tan  9 


a  sin  9  -  a  sin  9 

rr _ arr  cen _ acen 

ar  a  cos  9  -  a  cos  9 

rr  arr  cen  acen 


2  2  2 

a=a  +a  -2a  a  cos  (9  -  9 

r  rr  cen  rr  cen  arr  acen 


From  equation  (335)  it  follows  that 
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“  ^AA  +  +  2a  a  cos  (9  , .  -  9  ) 

4)  (jid  c  c  4)9  a9‘t>  ac 


(342) 


(343) 


In  order  to  complete  the  calculation  of  the  acceleration,  the  magnitude 
and  phase  angle  of  the  vector  .sum  of  the  radial  and  transverse  components  of 
acceleration  need  to  be  calculated.  The  complex  number  magnitude  of  the  vector 
sum  will  be  written  as 
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a  =  ae 
so  that 


j9a 


(344) 


from  which  it  follows  that 


(345) 


a^  cos  (20  )  =  a^  cos  (29  )  +  a^  cos  (29  ) 

a  <j)  a^'^  r  ^  slt' 

a^  sin  (29  )  =  a^  sin  (20  ,)  +  a^  sin  (29  ) 

a'  ^  a(^'  r  ar^ 


(346) 

(347) 


where  a^  and  9^^  are  given  by  equations  (343)  and  (342)  respectively,  and  a^.  and 
9ar  are  given  by  equations  (339)  and  (338)  respectively.  From  equations  (346) 
and  (347)  it  follows  that 


a"*  =  a^  +  a^  +  2a^a^  cos  [2(9  .  -  9  )] 

r  r  (J)  '  a<j>  ar'-^ 


tan  (29  ) 
a 


a^  sin  (29  )  +  a^  sin  (29  ) 

4> _  atj)^  r _  ar 

2  2 
a,  cos  (29  ,)  +  a  cos  (29  ) 

<f)  a(j)  r  ar 


(348) 
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The  measured  acceleration  is  equal  to  a  cos  0^  . 

The  relativistic  force  equations  for  a  particle  moving  in  bulk  matter  or 
vacuum  with  broken  internal  symmetry  are  best  written  in  terms  of  normal  and 
tangential  components.  The  equations  of  motion  of  a  particle  under  the  action 
of  normal  and  tangential  forces  are  written  as^^ 


=  niY^a^  =  my^a^^e^  ^®TT  ®aN^  (350A) 

F^  =  =  my^a^e^  ^^®TT  ^al)  (350B) 

where  Fjj  arad  F^  =  normal  and  transverse  complex  number  forces,  a^  and  a^  =  com¬ 
plex  number  normal  and  transverse  accelerations  written  as 


^T  = 


j  *^aN 

(351A) 

j®aT 

(351B) 

and  where  the  transverse  velocity  boost  is  written  as 


36) 


(352A) 


=  (1  _  v^/c^) 


-1/2 


= 


T 


j9vT 


with 


=  [l  -  2(v^/c)^  cos  (20^^)  +  (v^/c)^] 


tan  (20^^)  = 


(v^/c)2  sin  (20^^^) 

1  -  (v^/c)^  cos  (20^) 


(352B) 


(353) 


(354) 


Consider  now  the  question  of  the  conservation  of  angular  momentum  of  a 
body  under  the  action  of  a  radial  force  field  in  uniformly  rotating  bulk  mat¬ 
ter  or  vacuum  with  broken  internal  symmetry.  For  a  radial  force  field  in  a 
broken  symmetry  system,  equations  (340)  and  (341)  become 


a,  ,  cos  9  +  a  cos  9  =  0 

bq  a<p<p  c  ac 


(355) 


a,  .  sin  9  , ,  +  a  sin  0 
4)9  a4)4)  c  ac 


(356) 


In  order  for  equations  (355)  and  (356)  to  be  satisfied,  remembering  that 
a^^  >  0  and  a^  >  0  ,  the  following  conditions  must  hold 


a,  .  -  a  =0 

4)4)  c 


tan  9  ^  =  tan  0 

394)  a.c 


or 


(357) 

(358) 


0  ^ 

a94)  ac 


(359) 


For  uniform  rotation  and  a  radial  force,  the  combination  of  equations  (323), 
(324A),  (331),  (294),  and  (357)  gives  the  following  equation 


(360) 


Because  dtu/dr  <  0  equation  (360)  can  be  written  as 


362 


(361) 


Combining  equations  (333)  and  (359)  gives  for  uniform  lotauion  and  a  central 
force 


r.t 


TT 


(362) 


From  equation  (362)  it  follows  that  for  a  radial  force  and  uniform  rotation 


tan  6  ^  =  tan  6 

(ij,t  r  ,t 


(363) 


Combining  equations  (296)  and  (325A)  with  equation  (363)  gives 


d0 


d0 


_ r  _  _ w 

dr  do) 


(364) 


Substituting  equation  (364)  into  equation  (361)  gives 


r  +  2(^  =  0  (365) 

a  differential  equation  whose  solution  is 
2 

ujr  =  constant  (366) 

where  w  is  given  by  equation  (283).  Dividing  equations  (364)  and  (365)  gives 
also 

20  +0  =20  +9+6-0  -  8.  =  constant  (367) 

r  w  r  (|),t  t  t,t 

so  that  in  fact  combining  equations  (366)  and  (367)  gives 

_ 2 

jr  =  constant  (368) 

which  is  the  expression  for  the  conservation  of  angular  momentum  for  a  particle 
of  unit  mass  uniformly  rotating  in  a  central  force  field  that  is  located  in 
bulk  matter  or  vacuum  wherein  the  space  and  time  coordinates  exhibit  a  broken 
symmetry. 

Equations  (283)  and  (366)  show  that 
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constant 


(369) 
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2  ^  ^2  2 
^  ^64. 


1  + 


2  2 
^"9t 


Equation  (369)  allows  a  connection  to  be  made  between  the  t  =  0  and  t  =  “■ 
rotational  states  of  a  central  force  system  located  in  bulk  matter  or  vacuum 
with  broken  internal  symmetries  namely 


9  9 

r^Lj  (0)  =  r  <  u)  (“)  >  - r 

O  oo  <{,  ‘^Qt.(“) 


where 
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(“)  >  =  Lim 
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In  a  similar  way  equation  (367)  allows  a  connection  to  be  made  between  the 
t  =  0  and  t  =  “  values  of  the  internal  phase  angles  of  the  coordinates  of  a 
particle  in  a  central  force  system  located  in  bulk  matter  or  vacuum 


20  (0)  +  9.(0)  +  B  (0)  -  0^(0)  -  8  (0) 

r  9  9 ,  t  t  t ,  t 


(372) 


=  20  (“)  +  0,(”)  +  6,  ^(®) 

r  <i>  4>.t 


0^(»)  -  6 


t,t 


(”) 


Equation  (369)  shows  that  rotational  motion  is  shared  between  external  and 
internal  angular  motions,  and  this  equation  may  perhaps  be  of  value  for  de¬ 
scribing  the  rotation  of  galaxies,  neutron  stars,  molecules,  atoms,  and  atomic 
nuclei  where  internal  angular  motions  may  exist. 


A  special  case  of  interest,  expecially  for  gravitationally  bound  systems 
such  as  stars  or  planets,  is  the  situation  where  6j-  ^  0  but  d9j./dt  =  0  ,  0{.  =  0 
and  0;j)  =  0  .  This  gives  the  following  results 
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(380) 

Therefore  the  case  of  a  time  independent  9j.  combined  with  0^.  =  0  and  6^  =  0 
gives  the  standard  kinematic  and  dynamic  equations  (373)  through  (376) .  Thus 
the  effects  of  a  time  dependent  9^  with  0^  0  and  0(j)  0  can  be  discerned  from 

anomalies  in  the  rotational  motion  of  stars,  molecules,  atoms,  and  atomic  nuclei 
However,  the  effects  of  a  time  independent  0^  with  0^  =  0  and  6^  =  0  can  be  dis¬ 
covered  in  non-rotating  systems  through  its  effect  on  the  gravity  and  pressure 
of  non-rotating  (or  slowly  rotating)  stars  and  planets.  Section  7  shows  the 
effects  of  6j.  on  the  equilibrium  configurations  of  stars  and  planets. 

6.  EULER  EQUATIONS  FOR  BROKEN  SYMMETRY  MATTER.  This  section  considers 
Euler's  equations  of  motion  for  a  broken  symmetry  fluid,  and  is  a  prelude  to  the 
study  of  stellar  and  planetary  equilibrium  which  is  considered  in  Section  7.  The 
standard  special  relativistic  Euler  equations  for  the  radial  and  transverse  di¬ 
rections  are  written  as^^»^^ 


(p  +  P^/c^)Y^a^ 


9P^  ,  a 

■  ^  \r 

3P^  \ 

1  _  IH! 

•  V 

a  r 

3r 

I?) 

9r' 

(381) 


(p  +  P^/c^)Y^a^  =  -  (  “T 
a  cp 


1  3P“  a 

-  +  v^  - 

a  ..a  <P  ^  a 

r  3<t)  ^  3t 


_L 

r^  34>^ 


(382) 


where  a^  and  af  are  the  conventional  radial  and  transverse  components  of  ac¬ 
celeration,  p  -  proper  mass  density,  P^  =  pressure,  W®  =  macroscopic  externa] 
force  potential,  and  where 


V.  - 


B  =  V  /c 
a  a 


2  2  2 

v^  =  V  +  vf  (383) 

a  ra  (pa 


In  section  7.  W  will  be  taken  to  be  the  gravitational  potential. 

It  has  been  shown  that  in  bulk  matter  the  pressure  has  an  internal  phase 
angle  as  represented  by  equation  (10),  and  that  the  coordinates  within  bulk  mat¬ 
ter  also  have  internal  phases  such  as,  for  example,  is  represented  by  equation 
(49)  for  the  radial  coordinate.  Therefore  the  generalization  of  the  special  rel 
ativistic  Euler  equations  to  the  case  of  bulk  matter  with  broken  internal  symme¬ 
tries  is  written  as 
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(p  +-^)yh^  =  Y^a^(p  e^®P)e^^®ar+2eY) 

c  c 

=  -/i£+  5  l£\  iH 

\  3r  ^  9t  /  3f 

|p  +  =  Y^a^(p  +  ^  ej®P)ej(W2eY) 

\  r  3^  4>  3t  /  r  3^ 


(384) 


(385) 


where  and  a^  are  given  by  equations  (334)  and  (335)  respectively,  and  W  can 
be  written  as  in  equation  (246)  .  The  complex  boost  is  written  as 


Z  -  ri  s2,.“l/2  je^ 

Y  -  (1  -  si  )  =  ye-^  »  (386) 

where 

6  =  v/c  V  =  ve^®''  =  v^  +  v^  (387) 

r  (f) 

and  where  the  boost  magnitude  and  internal  phase  angle  are  given  as 

Y  =  [l  -  28^  cos  (20  )  +  B^]  (388) 


-2  -2  -2 
V  =  V  +  V, 

r  6 


S  sin  (29  ) 

tan  (29.p  =  - - - - - 

1-5  cos  (29  ) 

V 


(388A) 


The  generalization  of  the  relativistic  Euler  equations  for  bulk  matter  with 
broken  internal  symmetries  can  also  be  written  for  the  x,  y,  and  i  coordinates 
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where  a^i  ay,  and  a^  are  given  by  equations  (190)  through  (192),  y  is  given  in 
terms  of  v/c  by  equation  (386) ,  and  where 

-2  -2  -2  -2 

V  =  V  +  V  +  V  (391A) 

X  y  X  ' 

Equations  (384)  and  (385)  or  equations  (389)  through  (391)  are  simple  general¬ 
izations  of  the  standard  special  relativistic  Euler  equations  to  the  case  of 
bulk  matter  with  broken  internal  symmetry. 

Euler's  equations  will  be  used  to  relate  the  internal  phase  angles  of  the 
coordinates  to  the  internal  phase  angle  of  the  pressure.  From  the  radial  accel¬ 
eration  equation  (384)  it  follows  for  3P/3t  =*  0  that 
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and  where 
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3e-73r 
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(399) 


and  where  is  given  by  equation  (64).  For  the  case  of  an  external  poten¬ 

tial  it  follows  from  the  radial  equation  of  motion  (392)  that 


Y  a  [p  cos  (6  -t-  29  )  +  — :r  cos  (0  +29-1-  9^)1 

r  ar  Y  2  ar  Y  P 
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(400) 


=  D_  cos  (<i>„  +  ir)  +  D„  cos  (<{>,,  +  tt) 
P  P  W  w 


=  -  Dp  cos  Op  -  cos  Oy 


Y^a  [p  sin  (9  +  29  )  +  — :r  sin  (9  +  29  +  9^)] 

r  ar  Y  2  ar  Y  P 
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(401) 


=  Dp  sin  (Op  +  it)  +  D^  sin  (0^  +  tt) 


=  -  Dp  sin  Op 


“w  ^’w 


From  equation  (400)  and  (401)  it  follows  that 
2 

Y^a^(p^  +  ^  +  2p  cos  9p)  =  Dp  +  D^  +  2DpD^  cos  (0^  -  Op)  (402) 
c^  c 

Equations  (400)  and  (401)  determine  a^  and  9^^  .  Note  that  a^  and  9g^  are  re¬ 
lated  to  the  component  acceleration  terms  through  equations  (338)  and  (339). 
Expressions  similar  to  equations  (400)  and  (401)  can  be  derived  for  the  trans¬ 
verse  acceleration  from  equation  (385)  . 

Consider  now  the  case  of  static  equilibrium.  In  this  case  the  acceleration 
terms  in  equations  (400)  and  (401)  are  equal  to  zero,  with  result 

Dp  =  D^  (403) 


tan  (<J>p  +  it)  =  tan  ('I'^  +  ir)  or  tan  •Jp  =  tan  (404) 
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Because  Dp  >  0  and  Dy  >  0  ,  the  only  way  equations  (400)  and  (401)  can  have 
their  left  hand  sides  equal  to  zero  is  to  have  Dp  =  D^  and 

cos  '5)p  =  -  cos  Oy  (406) 

sin  1)_  =  -  sin  (407) 

r  W 

which  requires  equation  (405)  to  be  valid  while  at  the  same  time  satisfying 
equation  (404).  Combining  equations  (396),  (397)  and  (405)  gives 

«P  ^  “p.r  ■  ew.r  +  ” 

Equations  (403)  and  (408)  are  the  equations  for  static  equilibrium  for  the 
Euler  equations  describing  bulk  matter  with  broken  internal  symmetries  under 
the  action  of  an  external  potential  (which  also  has  a  broken  symmetry).  The 
phase  angle  0p  is  determined  by  the  relativistic  state  equation  as  shown  in 
Reference  6  for  solids  and  quantum  liquids,  and  in  an  accompanying  paper  for 
the  real  gases.  Therefore  since  9y  and  6y,r  related  to  the  coordinates 
r  and  9j.,  it  is  equations  (403)  and  (408)  that  relate  the  phase  angle  0j.  of 
the  radial  coordinate  to  P  and  9p  of  the  equation  of  state.  This  will  be  made 
explicitly  clear  in  Section  7  where  gravitational  equilibrium  in  stars  and 
planets  is  considered. 

Strictly  speaking,  only  for  a  bulk  matter  system  in  which  an  external  po¬ 
tential  acts  can  one  define  a  variation  of  9^.  with  spatial  coordinates,  because 
only  in  this  case  can  a  physical  choice  or  origin  of  coordinates  be  made  (such 
as  the  center  of  a  star  ex  planet)  from  which  to  measure  the  coordinate  r  and 
thereby  evaluate  the  denominators  in  equations  (394)  and  (395).  Only  then  is 
there  a  fixed  reference  point  from  which  to  calculate  the  variation  of  the  phase 
angles  such  as  9^,  9^,  and  over  macroscopic  distances.  However,  9^.,  0^,  and 
03  are  determined  by  the  broken  symmetry  of  the  local  pressure  9p  and  the  broken 
symmetry  of  the  local  potential  9^  through  equations  (384)  and  (385). 

7,  EQUILIBRIUM  OF  STARS  AND  PLANETS.  The  equilibrium  of  stars  and  planets 
that  are  composed  of  matter  with  broken  internal  symmetries  can  be  obtained  from 
the  complex  number  form  of  Euler's  equation  (384)  or  the  equivalent  equations 
(403)  and  (408)  .  The  gravitational  potential  energy  that  includes  the  effects 
of  the  broken  symmetry  of  the  space  coordinates  is  written  as 


W  =  Wej®W  =  _  ^  ^ 
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0„  =  TT  -  e 
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where  M  =  M(r)  =  mass  at  radius  r.  Newtonian  gravity  is  assumed  to  be  valid 
in  this  paper,  so  that  the  force  is  dependent  only  on  f  (through  r"^) .  No 
explicit  dependence  on  the  angular  coordinates  ij)  or  (|)  is  assumed.  However,  the 
radial  coordinate  phase  angle  9^  can  depend  on  angles,  0^  =  0j.(r ,4))  . 

The  first  equilibrium  condition  that  is  derived  from  the  Euler  equation  is 
given  by  equation  (403).  Substituting  equations  (411)  and  (412)  into  equation 
(395)  gives 
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r 


(413) 


and  therefore  substituting  equations  (394)  and  (413)  into  equation  (403)  gives 
the  first  equilibrium  equation  for  a  gravitating  star  as 


(414) 


Considering  the  fact  that  in  a  gravitating  star  or  planet  3P/9r  <  0  ,  equation 
(414)  can  be  rewritten  as 


(415) 


which  reduces  to  the  standard  stellar  equilibrium  equation  for  0p  =  0  and 
9^  =  0  ,  namely^** 


iif.  =  _  GMp 
9r  2 


(416) 


where  the  mass  is  related  to  the  density  and  radial  coordinates  by 
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Note  that  equation  (415)  can  also  be  rewritten  as 


^r,r  ^  ^  _  GMp 

cos  6_  3r  2 
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(418) 


If  the  terms  involving  the  internal  phase  angles  in  equation  (415)  are  assumed 
to  be  small  it  follows  from  this  equation  by  expanding  the  radicals  and  solving 
a  quadratic  equation  for  3P/3r  that  to  a  first  approximation 
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where 
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Therefore  to  first  order  the  pressure  gradient  in  equation  (419)  for  stellar  and 
planetary  interiors  with  broken  internal  symmetry  differs  from  the  conventional 
result  given  in  equation  (416)  by  two  opposing  terms  that  are  related  to  and 
Sp  respectively.  Solving  for  the  mass  M  from  equation  (414)  and  placing  the  ex¬ 
pression  in  equation  (417)  gives  the  following  combined  equilibrium  equation 
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or  equivalently  as 
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Similarly,  using  equation  (419)  for  this  purpose  gives 
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where  V  is  given 
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by  equation 
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The  second  gravitational  equilibrium  equation  can  be  obtained  by  noting 
that  equations  (399),  (412)  and  (64)  yield 
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where  6r,r  given  by  equation  (64),  so  that  it  follows  from  equations  (408), 
(412),  and  (424)  that  the  second  gravitational  equilibrium  equation  is 
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where  6p,r  given  by  equation  (398).  Equation  (425)  can  be  used  to  solve  6^. 
in  terms  of  9p  because  this  equation  can  be  written  as 
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Equation  (427)  can  be  simplified  by  writing 
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where  3p  is  a  small  quantity  which  can  be  positive  or  negative.  Combining 
equations  (425)  and  (427)  gives  the  second  gravitational  equilibrium  condition 
as 
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From  equation  (398)  it  follows  that  the  case  of  9p  >  0  and  39p/3r  <  0  (corre¬ 
sponding  to  planets  and  degenerate  stars  such  as  neutron  stars  and  white  dwarfs) 
gives  6p  p  >  1T  or  Ep^j^.  >  U  ,  and  from  equations  (428)  and  (64)  it  follows  that 


9  <0  and  3  >  0 

r  r,r 


For  gaseous  stars  it  may  be  possible  to  have  9  >  0  or 


8p  <  0  because  of  a  degeneracy  in  the  state  equation  of  the  relativistic  real 
gas  (see  accompanying  paper  on  real  gases).  For  gaseous  stars  with  6p  <  0 
and  39  /3r  >  0  it  follows  from  equation  (398)  that  3p  <  tt  or  3^  <0  and 

therefore  from  equation  (428)  it  follows  that  9^  >  0  and  j-  •  T^is  anal¬ 
ysis  assumes  that  3P/3r  <  0  for  all  stars  and  planets.  Combining  equations 
(396)  and  (397)  with  equations  (412),  (424),  and  (425)  gives 


5p  =  IT  -  29^  (429) 

v„  =  -  29  (430) 

W  r 

Equation  (429)  follows  from  the  fact  that 


3r  “  ■  -2 
r 


(431) 
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Equation  (428)  is  the  second  equilibrium  equation  derived  from  the  general  Euler 
equilibrium  equation  (408) . 

Equation  (422),  or  the  approximation  equation  (423),  along  with  the  equil¬ 
ibrium  equation  for  the  internal  phases  given  in  equation  (428)  are  the  two 
equilibrium  equations  for  a  gravitationally  bound  star  or  planet.  These  equa¬ 
tions  involve  P,  p,  9p,  and  0^-,  so  that  clearly  two  additional  equations  are 
required  for  a  complete  solution  of  the  equilibrium  configuration  (actually  an 
energy  generation  equation  is  also  required).  The  two  additional  equations 


that  are  required  are  the  state  equations  which  specify 

P  =  P(p,T)  (432) 

9p  =  9p(p,T)  (^33) 

the  magnitude  and  internal  phase  angle  of  the  complex  number  pressure.  Equations 
(432)  and  (433)  can  be  used  to  develop  the  following  relationships 

iZ  =  IP  IP  +  iZ  (434A) 

3r  3p  3r  3T  3r 

ZZ  =  iZ  iP  +  IZ  il  (434B) 

34)  3o  B\l>  3T  dip 

IZ  =  ZZ  IP  +  ZZ  (4340 
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(u35A) 

(435B) 

(435C) 


where  r,  ip,  and  p  are  the  spherical  coordinates  whose  origin  is  at  the  center 
of  the  star.  Defining  the  following  quantities 


39p/3p 

tan  =  P  — 

P,p  3P/3p 

30p/3T 

tan  =  P  3p/gx' 


(436) 

(437) 
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allows  equations  (435A)  titrough  (435C)  to  be  written  as 


P  -  1  3P  ^  o  3P  3T\ 

3r  P  (  '?,p  3p  3r  "  ^P.T  3T  3r  ) 
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Similarly  for  the  internal  phase  angle  of  the  radial  coordinate 


(439C) 
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which  can  be  used  to  evaluate  equations  (64)  and  (67)  .  Equations  similar  to 
equations  (440)  hold  for  Q^p  and  9^  ,  but  these  internal  phase  angles  aze  taken 
to  be  zero  in  the  simplest  theory  of  gravitational  equilibrium.  In  any  case, 
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it  is  clear  that  the  determination  of  the  equilibrium  configuration  of  stars  and 
planets  require  the  determination  of  9p(r)  as  part  of  the  solution. 

Both  of  these  phase  angles  must  approach  their  vacuum  values.  6^^'  and  0^^^,  at 
the  surface  of  the  star  or  planet. 

The  magnitude  P(p,T)  and  internal  phase  angle  9p(p,T)  of  the  relativistic 
pressure  are  obtained  from  a  solution  of  the  relativistic  trace  equation  (1) 
along  with  the  magnitudes  and  internal  phase  angles  of  the  other  thermodynamic 
functions.®  A  T  =  0  degenerate  neutron  gas  state  equation  with  a  pressure  de¬ 
scribed  by  P°(p)  and  6p(p)  >  which  is  obtained  from  the  solution  of  the  T  =  0 
form  of  the  relativistic  trace  equation  (1),  can  serve  as  an  adequate  description 
of  a  neutron  star.®  The  radial  variation  of  the  inte’mal  phase  angle  of  the  ra¬ 
dial  coordinates  of  a  neutron  star  can  be  determined  from  P°(p)  and  9p(p)  using 
equations  (422)  and  (428) .  For  the  interacting  classical  or  quantum  gases  that 
occur  in  ordinary  stars,  the  internal  phase  angle  6p(p,T)  can  be  evaluated  from 
the  relativistic  third  and  higher  virial  coefficients  of  a  real  classical  or 
quantum  gas  at  high  temperatures.  The  relativistic  third  and  higher  virial  co¬ 
efficients  are  obtained  from  a  solution  of  the  relativistic  trace  equation  (1) 
for  the  real  gases.  Therefore  the  relativistic  third  and  higher  virial  coeffi¬ 
cients  of  the  state  equation  of  real  gases  will  play  an  important  role  in  the 
determination  of  the  equilibrium  conditions  of  ordinary  gaseous  stars. 

The  equilibrium  of  gravitating  planets  is  treated  in  a  slightly  different 
manner  than  for  stars,  but  the  two  basic  equilibrium  equations  (422)  and  (428) 
are  also  valid  for  gravitating  planets.  Equation  (422)  will  be  written  in  a 
slightly  different  form  for  planets.  As  in  the  case  for  stars,  the  complex  num¬ 
ber  equilibrium  equation  is  written  as 

|l  =  -  (440 

r 

or 


r 

where  Dp  is  given  by  equation  (394)  and  $p  is  given  by  equation  (396).  Equa¬ 
tion  (441)  can  be  rewritten  in  terms  of  a  density  derivative  by  introducing  the 
bulk  modulus  at  constant  entropy  Kg  .  In  order  to  determine  Kg  ,  the  bulk  mod¬ 
ulus  at  constant  temperature  Kx  must  first  be  introduced.  The  constant  temper¬ 
ature  bulk  modulus  is  given  by® 


iZ 

3p 


J(9 


P'*'^P,p^ 


(443) 


where 
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(444) 


and  where  3p^p  is  given  by  equation  (436).  The  bulk  modulus  at  constant  entropy 
is  easily  found  to  be  given  by 


S  =  ^  i)  ■ 

where  the  complex  number  Grtineisen  function  y  is  given 
tion  (445)  can  be  written  in  component  form  as 

Kg  cos  =  K.^  cos  (0p  +  6p^p)  +  yN  cos  (0^  +  0p 

Kg  sin  =  Kj,  sin  (0p  +  Sp^^)  +  yN  sin  (0^  +  0p 


in  equation 

+ 

+  3p^^) 


(445) 

(5) .  Equa- 

(446) 

(447) 


where  expressions  for  the  magnitude  y  and  internal  phase  0y  of  the  Grlineisen 
function  are  given  in  Reference  6,  6p  p  and  6p  >j>  are  given  by  equations  (436) 
and  (437)  respectively,  and  where® 


Equations  (446)  and  (447)  give  immediately 

ICj.  sin  (0p  +  6p^p)  +  yN  sin  (0^  +  ^p  +  3p^^) 
®KS  “  cos  (0p  +  Sp’p)  +  yN  cos  (9^  +  ®p 


(448) 


(449) 


Kg  =  K^  +  Y^N^  +  2yNK^  cos  (9^  +  6p^^  -  Bp^^)  (450) 

which  allow  the  calculation  of  the  phase  angle  and  magnitude  of  the  bulk  modulus 
at  constant  entropy. 

Combining  equations  (434A) ,  (441)  and  (445)  gives  the  following  approxi¬ 
mation  for  a  planet  with  broken  symmetry  matter^ ^ 


_  GP  M  ^ 
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v^r 


(451) 


where  the  adiabatic  velocity  of  elastic  waves  in  a  material  with  broken  internal 
symmetry  is  given  as 
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(452) 


-2 

A^j^vS  =  ^ 

/'  /•  Q  0  \ 

""s  ~ 

p 

o 

Z 

D 

P 

(453) 

CD 

< 

KA 

=  \s 

(454) 

Equation  (451)  can  also  be  rewritten  as 
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where 


(456) 


cos  6^  ^  =  [1  +  (r  30^/3r)2 
r )  t  r 

Substituting  the  expression  for  the  mass  in  equation  (455)  into  equation  (417) 
gives 


cos  6 


r,r 


J_  _L  (ll  i£  2 
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cos  6 
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(458) 


where  Vg  is  obtained  from  equations  (450)  and  (453) .  Equation  (458)  is  the 
first  equilibrium  equation  for  gravitating  planets  and  is  the  analog  of  equa¬ 
tion  (422)  for  stars,  while  equation  (456)  is  the  second  equilibrium  equation 
for  gravitating  planets  with  broken  internal  symmetry  and  is  the  analog  ot  equa 
tion  (428)  for  stars  with  broken  internal  symmetry.  Finally  it  should  be  point 
ed  out  that  for  matter  with  broken  internal  symmetries  the  adiabatic  wave  veloc 
ity  is  given  by  a  simple  formula,  analogous  to  the  conventional  formula  for  sym 
metric  matter,  as  follows^^ 

-2  -2  4  -2 

Vg  =  u  -  -j  6^  (459) 

where  a  and  8  =  compression  and  shear  wave  velocities  respectively  for  matter 
with  broken  internal  symmetries.  Writing 
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(460) 


a  =  ae 


6  =  Be 


gives 


2 


vS" 


which  are  equivalent  to  the  following  equations 
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tan  (26^g)  =2  42 

a  cos  (20  )  -  —  6^  cos  (26  ) 

OL  j  p 


B^  cos  (29g) 

(461) 

B^  sin  (26g) 

(462) 

equations 

sin  (26g) 

(463) 


Vg  =  Qt^  +  ^  6^  -  I  cos  [2(0^  -  9g)]  (464) 

The  measured  adiabatic  wave  velocity  =  Vg  cos  8^g  ,  while  the  measured  compres¬ 
sion  and  shear  wave  velocities  *  a  cos  6(j  and  B  cos  6g  . 

A  knowledge  of  P  and  0p  as  a  function  of  density  and  temperature  can  be 
obtained  experimentally  from  high  pressure  measurements  on  earth  materials  such 
as  olivine  and  gabbro.  Alternatively  P  and  6p  can  be  obtained  from  the  solution 
of  the  relativistic  trace  equation  (1)  if  the  unrenormalized  pressure  P^  and 

GrUneisen  function  can  be  estimated  from  atomic  structure.  The  seismic  wave 

velocity  Vg  and  its  internal  phase  angle  ©^g  can  then  be  obtained  from  equations 
(453)  and  (454)  respectively.  Finally,  equations  (463)  and  (464)  can  be  invert¬ 
ed  to  find  the  relativistic  values  of  the  compression  wave  velocity  a  and  the 
shear  wave  velocity  B  .  It  may  be  possible  to  reverse  the  arguments  and  mea¬ 
sure  a  and  B  which  gives  vg  and  6^g  by  equations  (463)  and  (464)  and  then  obtain 

P  and  9p  from  equations  (449),  (450),  (453),  and  (454).  Equations  (456)  and 
(458)  are  the  equilibrium  equations  for  a  planet  whose  solution  gives  p (r)  and 
9]-(r)  in  terms  of  P  and  9p  .  As  in  the  case  of  the  equilibrium  calculation  for 
stars,  two  auxiliary  state  equations  of  the  form  given  in  equations  (432)  and 
(433)  are  required.  In  any  case,  it  is  clear  that  P,  6p,  and  9^.  are  required 
for  an  understanding  of  the  equilibrium  configuration  and  seismic  properties  of 
a  planet. 

From  the  previous  analysis  it  is  clear  that  the  Newtonian  force  of  gravity 
acting  on  a  unit  mass  at  a  distance  r  from  the  center  of  a  spherical  body  of 

mass  M(r)  with  broken  internal  symmetry  is  written  as 
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where  ijjj  =  r  cos  0^  =»  measured  value  of  the  radial  distance  between  two  points, 
F  =  complex  Newtonian  gravity  force  with  internal  phase,  and  =  real  part  of 
the  gravity  force  in  the  radial  direction  which  is  the  measured  gravity  force. 
The  force  Fj^  must  be  compared  to  the  force  Fg  =  conventional  Newtonian  gravity 
force  for  asymmetric  matter  which  is  given  by 
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GM  GM 
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r  r  cos  6 

m  r 


The  difference  F  -  F  is  given  by 

H  3i 


=  ^  [cos~^  -  cos  (20^)] 
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(468) 


where  the  last  two  approximations  are  valid  for  small  0^.  ,  and  where 
0r  =  0j.(r,<(),(|))  is  a  function  of  the  spherical  polar  coordinates  of  the  unit 
mass.  Therefore  the  effect  of  broken  symmetry  matter  on  Newtonian  gravity  is 
to  imply  that  there  is  a  new  additional  repulsive  gravity  force  Fjj  in  operation 
which  does  not  have  a  strictly  r~2  dependence  on  radial  coordinates.  But  in 
fact  gravity  in  the  planets  is  Newtonian  in  form  (neglecting  general  relativity 
effects)  and  has  a  f~^  dependence  as  given  in  equation  (465)  for  broken  symmetry 
matter.  The  apparent  deviation  from  Newtonian  gravity  is  due  to  the  internal 
phase  angle  0^ (r,'|',<|i)  of  the  radial  coordinate  which  can  have  a  complicated  co¬ 
ordinate  dependence  because  of  the  inhomogeneous  nature  of  the  earth's  core, 
mantle  and  crust.  Equation  (466)  shows  that  F^  does  not  have  an  r"^  (or  r"^) 
dependence  on  coordinates. 

The  rate  of  change  of  the  force  of  gravity  with  radial  distance  is  obtained 
for  broken  symmetry  matter  from  equation  (466)  to  be 
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and  for  radial  variations  only 
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The  corresponding  variation  for  the  conventionally  calculated  Newtonian  gravity 
force  given  by  equation  (467)  is 
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Introduce  the  parameter 
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then  a  simple  calculation  shows  that  to  second  order  in  0^.  (there  are  no  first 
order  terms) 

n)  (474) 


(475) 


and  30j./3r  >  0  ,  so  that  in  general  n  should  be  small  and 

Therefore  experimental  measurements  of  the  variation  of  the  force  of  grav¬ 
ity  with  height  should  indicate  D  <  0  ,  while  measurements  of  the  gravity  force 
itself  should  yield  Fjj  >  0  .  The  net  result  of  the  internal  phase  of  the  radi¬ 
al  coordinate  is  that  the  measured  gravity  force  given  by  equation  (466)  should 
be  slightly  weaker  than  that  predicted  by  the  conventional  Newtonian  force  giv¬ 
en  by  equation  (467) .  A  weaker  than  Newtonian  gravity  force  has  been  experi¬ 
mentally  observed  in  geophysical  measurements  and  in  new  EOtvOs  experimentsi®”^^ 
These  results  have  been  interpreted  to  be  due  to  a  new  finite  range  repulsive 
force  associated  with  gravity . ^ ® ^ ^  Reference  21  contains  many  citations  to 
the  literature  in  this  field.  However  the  results  of  the  present  paper  show 
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that  in  fact  the  weaker  attractive  force  that  is  observed  may  be  due  to  ordi¬ 
nary  Newtonian  gravity  operating  in  matter  with  broken  internal  symmetries  as 
in  equation  (465).  A  complete  understanding  of  the  earth's  gravity  field  will 
require  a  detailed  knowledge  of  the  internal  phase  of  the  radial  coordinate 

and  its  variation  with  location.  The  orbits  of  satellites  and  bal¬ 
listic  missiles  will  be  affected  by  the  internal  phase  function  6^ (r,tf(,<i))  , 
and  perturbations  in  these  orbits  that  are  not  explained  totally  by  shape  and 
density  variations  in  the  earth  may  lead  to  techniques  for  determining  local 
values  of  . 

8.  CONCLUSION.  By  means  of  a  relativistic  trace  equation,  the  Minkowski 
metric  of  spacetime  impresses  a  broken  symmetry  on  the  matter  and  vacuum  that  are 
embedded  in  spacetime.  The  broken  internal  symmetries  of  matter  and  the  vacuum 
are  manifested  at  the  microscopic  level  through  the  internal  phase  angles  that 
are  associated  with  the  coordinates  and  the  kinematic  and  dynamic  variables  for 
single  particles.  At  the  macroscopic  level  the  broken  symmetries  appear  in  the 
thermodynamic  functions  such  as  pressure  and  internal  energy  of  interacting  sys¬ 
tems.  Within  bulk  matter  and  the  vacuum,  space  and  time  exhibit  broken  symmet¬ 
ries  that  are  manifested  by  internal  phase  angles  that  produce  the  broken  sym¬ 
metries  of  the  kinematic  and  dynamic  parameters  and  the  broken  symmetry  of  geo¬ 
metrical  constructs  such  as  angles,  lengths  and  areas.  The  physical  rotation  of 
matter  must  also  be  associated  with  the  rotation  of  the  internal  phase  angles  of 
the  space  and  time  coordinates.  The  internal  phase  angles  of  the  kinematic  and 
dynamic  parameters  of  bulk  matter  fluid  elements  are  determined  by  the  Euler 
equations  for  broken  symmetry  matter.  The  calculation  of  the  equilibrium  con¬ 
figurations  of  stars,  planets  and  other  gravitationally  bound  systems  such  as 
galaxies  must  include  the  determination  of  the  spatial  dependence  of  the  inter¬ 
nal  phase  angle  of  the  radial  coordinate  along  with  the  spatial  variation  of 
the  pressure  and  density.  This  can  only  be  done  if  the  state  equation  of  bro¬ 
ken  symmetry  matter  is  known  from  solutions  of  the  basic  complex  number  rela¬ 
tivistic  trace  equation.  The  fact  that  time  and  space  are  gauge  rotated  quan¬ 
tities  should  affect  the  basic  calculations  of  astrophysics,  geophysics,  and 
the  engineering  disciplines. 
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ABSTRACT 


Water  in  unsaturated  frozen  soils  generally  exists  in  three  phases: 
vapor,  unfrozen  (liquid)  water  and  ice.  Recent  experimental  data  indicate 
that  the  flux  of  water  f  in  frozen  soils  may  be  written  in  a  general  form: 

f  -  -  ^  -  0D,(«,T)  “ 

where  p  is  dry  density,  and  Dj  and  Dj  are  the  properties  of  a  given  soil 
that  generally  depend  on  the  content  of  total  water  in  three  phases  w  and 
the  temperature  T.  Since  Dj  and  Dj  may  vanish  depending  on  w  and  T,  the 
equation  of  mass  balance  becomes  a  quasilinear,  degenerate  equation  of 
parabolic  type.  Our  presentation  is  focused  on  a  couple  of  special  cases  of 
this  quasilinear  problem  which  we  encountered  during  our  search  for  accurate 
experimental  methods  to  determine  Dj  and  Dj . 


INTRODUCTION 

Water  in  unsaturated  frozen  soils  generally  exists  in  three  phases: 
vapor,  unfrozen  (liquid)  water  and  ice.  We  will  the  content  of  water 

in  three  phases  by  w.  Reported  experimental  data"*"  ’  indicate  that  a 
gradient  of  w  and  a  gradient  of  temperature  T  are  two  major  driving  forces 
of  water  in  unsaturated  frozen  soils.  Hence,  the  unidirectional  flux  of 
water  f  is  given  as 

f  -  fx  +  (1) 

f,--p  D,(w,T)  (2a) 

-  -  p  D2(w,T)  g  (2b) 

where  p  is  the  dry  density  of  the  soil  that  is  a  given  positive  number,  x  is 
a  coordinate,  and  t  is  time.  Nonnegative  functions  Dj  and  Dj  are  the 
properties  of  a  given  soil  that  must  be  determined  experimentally.  We  will 
describe  experimental  methods  for  measuring  Dj  and  Dj  and  discuss 
mathematical  problems  associated  with  these  experiments  below. 


FUNCTION  Di(w,T) 


The  experiment  consisted  of  connecting  two  long  columns  of  soil  chat 
were  of  the  same  size  and  the  same  dry  density  under  an  isothermal 
condition.  One  of  them  was  uniformly  dry  with  a  negligibly  small  water 
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content,  while  the  other  was  tinlfomly  wet.  At  time  t  -  0  we  connected  the 
two  coluona  to  make  a  single  coluan  from  which  no  water  escaped.  While  we 
maintained  the  coluan  at  a  specified  temperature,  water  was  transported  from 
the  wet  part  to  the  dry  part  across  the  contact  surface  between  the  wet  and 
the  dry  coluara.  After  a  specific  time  passed,  the  soil  coluan  was  quickly 
sectioned  into  aany  equal  segments.  The  water  content  of  each  segment  was 
deterained  graviaetrlcally . 

The  experiment  is  described  by  the  following  initial  value  problem 


dt  dx 


[d(v.T)  g] 


w(x,0)  -  w^ 


-  •  <  X  < 


-  «»  <  X  <  0 


0  ^  X  <  +  • 


where  T  is  a  given  temperature  and  w  is  a  given  positive  number, 
seek  a  similarity  solution  u(f;)  -  w<x,t)  with  i)  ~  x  t  '  ,  Eqs.  3 


When  we 
3  and  4  are 


reduced  to 


[Di(u,T)  u']'  +  ^  u'  -  0 


-  *  <  fj  <  +  * 


u('-«)  -  w 


u(+«>)  -  0 


where  primes  denote  differentiation  with  respect  to  o. 

|ince  Dj  vanishes  at  w  -  0,  Eq.  3  degenerates  at  this  point.  It  is 
known  tha^^t^l  problem  of  Eqs.  3  and  4  has  a  unique  weak  solution.  It  is 
also  known^  ’  that  the  problem  of  Eqs  5  and  6  has  a  unique  weak  solution 
that  is  the  asymptotic  solution  of  the  problem  of  Eqs  3  and  4.  Integrating 
Eq  5,  we  obtain 


Dju(»j) 


^  en 


UfJ  /u' 


Using  measured  probes  w(x)  in  the  place  of  u  in  Eq.  7  we  determined  the 
value  of  Di(w,T). 

The  measured  Dj(w,T)  of  Morin  clay  is  presented  in  Figure  1  as  a 
function  of  w  with  T  being  a  parameter.  It  is  known  that  the  unfrozen  water 
content  w*  in  a  frozen  soil  depends  mainly  on  the  temperature  T  and  that  w* 
decreases  with  decreasing  T.  Since  the  mobility  of  water  in  a  frozen  soil 
is  mainly  due  to  the  unfrozen  water,  the  function  D^Cw.T)  decreases  with 
decreasing  T.  A  common  feature  found  in  Figure  1  is  that  D^(w,T)  for  each 
given  T  has  two  peaks.  One  of  them  is  around  a  point  where  w  -  1.0%  and  the 
other  is  not  far  from  a  point  where  w  is  equal  to  the  maximxim  unfrozen  water 
content  w*  at  T.  For  instance,  the  value  of  w*  at  T  -  -1.0*C  is  about 
12.7%.  The  content  of  water  in  the  solid  phase  (ice)  increases  as  w 
increases  beyond  w*.  Since  the  presence  of  ice  tends  to  decrease  the 
mobility  of  unfrozen  water,  decreases  with  increasing  ice  content. 


FUNCTION  Da<w,t) 


tfe  will  conaictor  a  problem  In  which  a  cloaed  soil  column  inlclally  wich 
given  uniform  p  and  w  ia  subjected  on  one  end  x  -  0  to  a  temperature  and 

on  the  other  end  x  -  x  to  a  temperature  T  <  T  at  time  t  >  0.  We  assume 
that  the  temperature  distribution  T(x)  is  strictly  linear,  namely 


21 

dx 


a  -  (T  '  T  )/x 
c  w"  o 


(8) 


where  a  is  a  positive  number.  Ue  will  describe  the  problem  in  mathematical 
terms  as  follows. 


The  equation  of  mass  balance  for  water  is  given  as 
P  w(x,t)  -  *  f  for  x^  >  X  >  0  ,  t  >  0  (9) 

The  initial  condition  is  given  as 


w(x,  0)  -  w^  (10) 

where  w^  is  a  positive  number.  The  boundary  condition  is  given  as 

f(0.  t)  -  f(x^.  t)  -  0  (11) 

It  follows  from  Eq.  8  that  there  is  a  one>tO'one  correspondence  between 
X  and  T.  Substituting  T  by  x  in  Eqs.  2a  and  2b  and  using  Eq.  8,  we  rewrite 
£  as 

f  -  -  p(Di(w,x)  -  a  D2(w,  x)l  (12) 

Ue  will  introduce  a  new  function  ^(w,x)  defined  as 


w  X 

^(w.x)  -  f  Dj(w,x)dw  -  a  /  D2(w,x)dx  , 
0  0 


(13) 


Using  we  reduce  the  problem  of  Eqs.  9,  10  and  11  to  a  commonly  used  form 
given  as 


52  a/ 

w(x,0)  -  w 


for 


X  >x>0,  t>0 

o 


k  -  k  “ 


(U) 

(15) 

(16) 


When  we  seek  a  stationary  (time-independent)  solution  w'*^(x)  to  the  problem  of 
tqs .  14,  ib  and  16,  w  (x)  must  satisfy  the  following  equation  if  it  exists: 


,  + 

Di(w  ,x)  ^  -  a  D2(w  ,x) 


(17) 


It  follows  from  Eq.  17  that  the  stationary  solution  w  (x)  is  a  nondecreasing 
function  in  a  part  where  Dj  is  positive  if  D,  are  nonnegative. 
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The  solution  w  corresponds  to  a  stationary  state  of  the  soil  column  in 
which  the  net  flux  of  water  f  vanishes  everywhere  in  the  column  while  the 
dry  density  is  kept  at  the  initial  value.  It  should  be  mentioned  that  this 
stationary  state  may  not  necessarily  be  a  state  of  equilibrium  so  that  a 
local  circulation  of  water  may  occur.  When  the  soil  column  initially  with 
uniform  p  and  w  is  subjected  to  the  temperature  gradient  given  by  Eq.  8,  the 
transport  of  water  is  expected  to  occur  in  the  positive  direction  of  x 
because  of  Eq.  2b.  As  water  moves  toward  the  cold  end,  the  initial  uni¬ 
formity  of  w  breaks  down  and  a  driving  force  of  water  toward  the  warmer  end 
starts  to  build  up  because  of  Eq.  2a.  Sooner  or  later  two  driving  forces  of 
water,  one  due  to  a  temperature  gradient  and  the  other  to  a  gradient  of 
water  content,  tend  to  balance  each  other  while  the  profile  of  water  content 
w(x,t)  asymptotically  approaches  t^e  stationary  profile  with  increasing 
time.  If  we  are  able  to  measure  w  (x)  experimentally,  then  Dj  can  be  deter¬ 
mined  by  Eq.  17  for  a  given  soil  with  known  Dj . 

It  is  not  certain  that  the  anticipated  event  described  above  actually 
takes  place.  For  instance,  the  time  required  for  the  column  to  reach  a 
stationary  state  may  turn  out  to  be  too  long  for  the  method  to  be  practical. 
These  problems  must  be  examined  experimentally. 

(1)  Experimental  Results 

12  13 

A  typical  evolution  of  the  water  ’  concent  profile  w(x,c)  with  time 
is  presented  in  Figure  2  under  conditions  that  w  -  15%,  T  -1.40®C,  T  =■ 
-4.95*C  and  a  -  0.310  *C/cm.  It  is  clear  from  Figure  2  that  water  moves  in 
the  direction  of  negative  temperature  gradient  and  that  w  tends  to  converge 
to  a  stationary  profile  as  time  increases.  The  profiles  at  t  -  22  days  and 
at  c  -  34  days  differ  little.  This  implies  that  the  flux  of  water  f(x) 
almost  diminishes  everywhere  after  t  -  22  days.  An  interesting  feature  of 
these  profiles  is  the  appearance  of  a  maximum. 

The  effect  of  w  on  the  stationary  profile  is  shown  in  Figure  3 .  These 
experiments  were  conducted  under  the  same  thermal  conditions  as  those  in 
Figure  2.  Among  the  four  profiles  in  Figure  3,  the  measured  stationary 
profile  for  the  case  of  w  -  5%  is  monotonically  increasing,  and  the  maximum 
appears  in  the  profiles  for  three  other  cases.  We  conducted  many 
experiments  similar  to  those  presented  in  Figure  3  under  various  thermal 
conditions.  From  these  experiments  we  found  that  the  stationary  profile 
w  (x)  generally  consists  of  thr^e  parts,  Rj ,  R^  and  R3 ,  depending  upon  the 
value  of  the  first  derivative  w  (x) ,  when  w  is  greater  than  10%  or  so. 

These  three  parts  are  characterized  as  ° 


•  + 

w  > 
X 

0 

in 

Ri(0  <  X  <  X  ) 

‘  in 

(18a) 

< 

0 

in 

R-Cx  <  X  <  X  ) 

^  m  n 

(18b) 

- 

0  (w  -  w  ) 

0 

in 

RgCX  <  X  <  X  ) 
n  0 

(18c) 

where  x  i^  the  point  where  w^  attains  its  maximum.  We  did  not  assign  the 
Yolue  o?  w^  at  X  -  x^  because  we  are  not  certain  about  the  continuity  of 
w  at  X  in  view  of  our  measured  profiles  that  often  had  a  maximum 
resembling  a  sharp  peak. 


As  t^e  first  step  we  calculated  the  value  of  Dj  from  Eq.  17  by  using 
part  of  w  in  Rj  to  find  the  properties  of  Dj  as  a  function  of  w  and  T.  The 
calculated  values  of  Dj  are  presented  at  four  temperatures,  -1.00,  -0.50,  - 
0.25  and  -0.10*C,  in  Figure  4  where  curves  are  drawn  to  show  the  general 
trend  of  data  points.  A  common  feature  found  in  Figure  4  is  that  DjCw.T) 
for  each  given  T  has  one  peak.  Dj  increases  as  w  increases  up  to  some  point 
w  and  then  decreases  as  w  increases  beyond  this  point.  This  decrease  of  Dj 
is  caused  by  Increasing  ice  content. 

(2)  Mathematical  Problem 

Let  us  assume  that  and  are  smooth  functions  of  w  and  T  given  by 
Figures  1  and  4,  respectively.  Under  such  an  assumption  it  is  easy  to 
explain  the  appearance  of  a  maximum  at  an  interior  point  x  as  described  by 
Eqs.  18a,  18b  and  18c  from  a  physical  point  of  view  because  the  lack  of 
water  movement  in  Rg  causes  the  accumulation  of  water  in  some  part  where  x  < 
X  when  water  moves  in  the  direction  of  negative  tmeperature  gradient.  An 
important  question  arises  whether  the  solution  to  the  problem  of  Eqs.  14,  15 
and  16  under  the  above  assumption  actually  behaves  like  the  measured 
profiles.  We  do  not  have  the  answer  to  this  problem.  However,  we  will  show 
below  that  Eq.  14  degenerates  at  the  point  x  where  w  attains  its  maximum  if 
the  profile  characterized  by  Eqs.  18a,  18b  and  18c  is  a  solution  to  the 
problem  of  Eqs.  14,  15  and  16. 

We  will  consider  the  profile  w(x,t)  in  the  earlier  stage  of  an 
experiment  such  as  Exp.  1  in  Figure  2.  From  such  a  profile  we  find. 


w  > 
X 

0, 

f  > 

0 

in 

Ri(0  <  X  <  X  ) 

m 

(19a) 

< 

0, 

> 

0 

in 

R2(X  <  X  <  X  ) 

^  m  n 

(19b) 

- 

0, 

- 

0 

in 

R3(X  S  X  <  X  ) 

^  n  0 

(19c) 

We  will  evaluate  of  the  profiles  given  by  Eqs.  19a,  19b  and  19c. 
Differentiating  Eq.  13  with  respect  to  w  once,  we  obtain 

a  X 


(20) 


It  follows  from  Eqs.  19a,  19b  and  19c  that  a  one-to-one  correspondence 
generally  does  not  exist  between  w  and  x  for  a  given  time  t.  However,  we 
can  find  such  a  correspondence  in  each  of  Rj  and  R2  separately.  Hence,  we 
reduce  Eq.  20  to 


-  Di(w,x)  -  a  D-(w,x)/w  x  <  x  and  x  x  (21) 

w  ■*  ■‘X  n  m 

Using  Eq.  12,  we  reduce  Eq.  21  to 

-  f/(p  w  )  X  <  X  and  x  x  (22) 

w  X  n  ra 

Since  f  >  0  in  Rj  and  Rj  from  Eq.  22  we  obtain 


^  <  0 
w 


in  Ri 


(23a) 
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>0  in  Rj  (23b) 

It  follows  from  Eqs.  23a  and  23b  that  6  must  vanish  at  x  under  the 
*  ni 

assumption  and  that  Eq.  14  degenerates  at  this  point. 

Oleinik  et  al.”  showed  the  existence  of  a  unique  weak  solution  to  the 
problem  of  Eqs.  14,  15  and  16  with  the  condition  that  ^  -  0  at  w  -  0  and 

^  >0  for  w  >  0.  In  their  problem  Eq.  14  degenerates  at  w  -  0.  They 

siiowed  that  w  may  not  be  continuous  at  a  point  of  transition  between  a  part 
w  >  0  and  a  part  w  -  0  and  that  a  point  of  degeneracy  propagates  with  a 
finite  speed.  A  boundary  where  a  degeneracy  occurs  is  often  referred  to  as 
a  free  (or  moving)  boundary.  In  our  problem  Eq.  14  degenerates  not  only  at 
a  point  where  w  ~  0  but  also  at  a  point  where  w  attains  its  maximum  (^ 
changes  its  sign).  Equations  such  as  Eq.  14  in  which  the  coefficient  of  the 
highest  derivative  changes  its  sign  have  been  intensively  investigated 
lately^ The  results  of  such  investigation  are  needed  to  understand  the 
mechanism  of  water  transport  in  frozen  soils. 
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FIGURES 

1,  Function  D^Cw,!)  vs.  the  water  content  w%. 

2.  Typical  evolution  of  w(x,t)  with  time. 

3,  Effect  of  w  on  the  stationary  profile  w  (x) . 

o 

4.  Function  Dj(w,T)  vs.  the  water  content  w%. 
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FIGURE  1 

Function  D|(w,T)  vs.  the  water  content  w%. 
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FIGURE  4 


Function  DjCw.T)  vs.  the  water  content  w% . 
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SUMMARY 

Finite  element  analysis  of  rubberlike  materials  requires  the 
enforcement  of  an  incompressibility  condition.  Penalty,  Lagrange 
multiplier  and  mixed  methods  are  typically  used  to  enforce  the  constraint 
of  incompressibility.  These  methods  can  lead  to  poorly  conditioned  tangent 
matricies  or  add  a  large  number  of  variables  increasing  the  size  of  the 
tangent  matrix.  In  this  effort  the  use  of  the  implicit  variable 
elimination  method  is  investigated  for  enforcing  incompressibility  in 
rubber  elasticity  finite  element  analysis.  No  penalty  parameters  or 
Lagrange  multipliers  are  used  but  it  is  difficult  to  generalize  the  method 
for  two-  and  three-dimensional  analyses.  The  one-dimensional  inflation  of 
a  thick  rubber  cylinder  is  formulated  and  solved  to  demonstrate  the  method. 


INTRODUCTION 


The  formulation  of  algorithms  for  the  finite  element  analysis  of  large 
deformations  of  incompressible  materials  has  involved  many  efforts  since 
the  mid  1960's.  There  are  several  methods  in  use  at  the  present  time. 
They  include  penalty,  Lagrange  multiplier  and  mixed  methods  solved  with 
either  updated  or  total  Lagrangian  algorithms.  The  basic  problem  is; 
compute  the  minimum  of  some  energy  functional  such  that  a  "System  of 
constraint  equations  is  simultaneously  satisfied.  This  is  a  constrained 
optimization  problem  and  can  be  solved  using  methods  from  nonlinear 
constrained  optimization  theory.  Additional  techniques  include  successive 
quadratic  programming  and  implicit  variable  elimination  methods.  Below  we 
briefly  mention  references  to  the  current  methods  in  use  and  then  describe 
a  one -dimensional  implicit  variable  elimination  algorithm  for  the  thick 
rubber  cylinder.  The  review  is  intentionally  brief. 
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BACKGROUND 


The  hydrostatic  pressure  was  modeled  with  a  Lagrange  multiplier 
variable  and  used  to  attach  the  incompressibility  constraint  to  the 
potential  energy  for  rubber  by  Levinson  [1].  He  was  then  able  to  generate, 
solve  and  investigate  the  stability  of  solutions  to  the  equilibrium 
equations  for  the  internally  pressurized  Neo-Hookean  sphere.  Finite 
element  methods  using  the  Lagrange  multiplier  method  were  formulated  and 
reviewed  by  Oden  [2].  In  Oden's  formulations  nodal  displacements  and 
pressure  variables  are  related  through  the  nonlinear  equations  which 
represent  stationary  points  of  the  energy  functional.  He  then  solves 
illustrative  examples  including  large  deformations  of  rubber  membranes  and 
solids  of  revolution. 

Tielking  and  Feng  [3]  considered  problems  for  which  the 
incompressibility  constraint  can  be  satisfied  directly.  That  is,  problems 
for  which  computation  of  the  force  of  constraint  (hydrostatic  pressure)  is 
not  an  issue.  Instead  of  using  displacements  as  variables  they 
demonstrated  the  advantage  of  using  configurations  as  variables.  The  Ritz 
method  was  then  applied  globally  to  obtain  solutions  to  membrane  problems. 
The  configuration  variable  approach  could  then  be  used  to  construct  a 
finite  element  algorithm. 

While  studying  plasticity  problems  Nagtegaal,  Parks  and  Rice  [4]  made 
an  important  contribution  to  finite  element  theory  when  they  recognized  the 
problem  of  dependent  or  redundant  constraint  equations.  Too  many  or  too 
few  constraint  equations  in  the  finite  element  model  cause  numerical 
difficulties;  either  poor  convergence  rates  or  locking.  Efforts  were  then 
concentrated  on  how  many  pressure  variables  were  best  for  a  given  element 
formulation. 

An  extensive  review  of  work  done  by  Argyris,  et  al  [5]  included  a 
special  "fluid  filled"  finite  element.  These  elements  develop  an  internal 
energy  when  their  volume  (area)  changes.  The  energy  is  minimized  during  the 
solution  process  making  it  a  penalty  like  formulation.  These  penalty  and 
mixed  methods  were  under  review  at  the  same  time  by  Hughes  and  Malkus[6,7]. 
Equivalence  of  the  penalty  method  in  the  limit  to  the  Lagrange  multiplier 
method  was  proven.  Also,  it  was  noticed  that  quadratic  convergence  of  the 
Newton  -  Raphson  method  is  lost  when  large  penalty 
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parameters  are  used  (near  incompressibility).  The  use  of  configuration 
variables  mentioned  above  and  a  penalty  enforcement  of  incompressiblity  was 
investigated  by  Fried,  Johnson,  and  Quigley  [8,9,10].  These  formulations 
allow  for  efficient  computation  of  gradient  and  tangent  matricies  but  are 
still  subject  to  poor  convergence  when  large  penalty  parameters  are  used. 

A  completely  different  approach  to  enforcing  incompressibility  was 
investigated  by  Needleman  and  Shih  [11].  They  used  an  implicit  variable 
elimination  method  to  enforce  the  divergence  equation  (incompressibility 
constraint)  for  small  strain  plasticity  problems.  The  number  of 
displacement  variables  are  reduced  by  this  method  and  the  hydrostatic 
pressures  are  determined  after  the  displacement  solution  is  obtained. 
Because  of  the  element  to  element  interdependence  of  the  incompressibility 
constraint  superelements  must  be  constructed  during  the  variable 
elimination  process. 

The  enforcement  of  contact  constraints  for  large  deformantion 
minimization  problems  involving  configuration  variables  has  been 
investigated  by  Johnson  and  Quigley  [12,13,14].  Penalty,  successive 
quadratic  programming  and  implicit  variable  elimination  methods  have  been 
used  successfully. 

In  this  effort  we  investigate  implicit  variable  elimination  for  large 
strain  rubberlike  deformations.  A  formulation  and  results  are  presented  for 
the  one  -  dimensional  axisymmetric  deformations  of  an  internally 
pressurized  thick  rubber  cylinder. 

INFINITE  CYLINDER  MODEL 


In  this  section  we  construct  unconstrained  gradient  and  tangent 
matricies  for  the  one  -  dimensional  expansion  of  a  thick  rubber  cylinder, 
see  Figure  1.  We  let  (a,r)  represent  the  (undeformed, deformed) 
configurations.  Then,  the  principle  stretch  ratios  become: 

»  1.0 
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Am 

3  a 

Using  linear  two  -  node  elements  we  have 


T  T 

o(e)  »  (1  -  e)a^  +  eo^  =  ♦  a  ;  o  = 

r(e)  =  (1  -  e)rj^  +  er^  =  =  (r^^.r^) 


where  vectors  are  displayed  in  boldface.  Then,  we  write 
*  T 
X 

2  ^  T 

a 

e 


^3  “  .T 

a 


^2^3 


1 

2  ^  a^<^<pja 


We  select  the  unconstrained  tlO]  energy  density  function 
w  =  C^(I^  -  313^^^)  +  €2(12  -  313^^^) 

where  1^^  *  1  +  +  X3^ 


2  2  2  2 
I2  *  X2  +  X3  +  X2  X3 


T  ^  2,  2 

I3  =  ^2  ^3 

Assiuning  an  element  height  h  =  1  (Figure  1)  we  have  the  internal  energy  for 
an  element  on  as 

U  »  2it  ■  313^^^)  +  02(12  -  313^^^)]  a  da  (5) 

The  element  gradient  and  tangent  matricies  then  become 


g  “  2ii  (f-,^0  +  )  o  do  ;  X.  =—— 

*  2  2r  3  3r  ir 

a,  dX 

1  r 
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where 


k 


+  ^2r^3r  ^  ®2^2r^2r  ■*■ 

*3T^3r^3r  ^ 


=  2(Cj  +  C^)X2  +  -  2C^X^(X^X^y^^^  -  UC^X^(X^X^)^^^ 

*  2(C^  +  C2)X^  +  202X2^X2  -  2CjX2(X2X2)‘^^^  -  4C2X2(X2X2)^^^ 

g2  =  2(C^  +  C^)  +  202X2^  +  2/3  CjX2'^^^X2^^^  -  4/3 

-  ,r  ^  ^  .  ,1^  r  ^  -1/3,  -1/3  _  p  ,  1/3  1/3 

^1^2  ^3  I0/3 


After  changing  variables  (aj^,a2)  - >  (0,1)  and  using  one  point  integration 

we  have; 


We  can  now  quickly  assemble  global  gradient  and  tangent  matricies  using 
equations  (7). 
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REDUCED  GRADIENT  AND  TANGENT  /  UPDATE 

Given  the  unconstrained  internal  energy  function  above  and  internal 
pressure,  F,  the  minimization  problem  defining  deformed  configurations  is 

min  n  ■  Z  U  -  irF(r,^  -  a, 
e  11 


such  that 


2  2  2  2 
r2  -  r^  = 


(8) 


^3  ■  *^2 


etc. 


“3  ■  “2 


We  now  linearize  the  constraint  equations.  That  is,  use 


5v, 


V,  “  V,  +  I — 

1  lo  5r- 


V_  »  V-  +  t — 

2  2o  6r- 


etc. 


JVj 

o  *'1 


SVj 

o  *'2  *  ST^ 


6r.,  +  . . 
o  2 


&r_  4-  ... 
o  3 


(9) 


In  this  one  -  dimensional  problem  there  will  be  one  free  variable  ~ 
all  others  defined  by  the  constraint  equation  (9)  and  we  find 


rj/rj 
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■*^2 

*^3 
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■’^3 
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Solving  (10)  we  find 
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r,  /  r_ 
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1  2 
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6r 

r,  /r 

.  n 

1  n 
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Equation  (11)  can  be  used  to  compute  the  reduced  global  gradient  and 
tangent  consistent  with  the  linearized  constraint  equation  (9).  We  have 


n 

Z 

i*l 


Vi 


n  n 
k  =  Z  Z 


a.a.k. 


i=l  j»l  "  J 


(12) 


where  a.  =  r,/r.  and  g.,  k. .  are  from  the  global  maricies. 
ill  ®i  ij  “ 

Equation  (12)  is  used  to  update  r^  using  the  Newton  -  Raphson  method.  The 
remaining  variables  are  updated  by  sequentially  solving  the  constraint 
equations. 


RESULTS  AND  DISCUSSION 

We  solved  the  infinite  cylinder  problem  discussed  by  Oden  [2].  In 
particular  the  cylinder  inner  and  outer  radii  were  7.0  inches  and  18.625 
inches  respectively.  The  Mooney  -  Rivlin  constants  and  C2  (eq(5))  were 
taken  as  80.0  and  20.0  psi  and  ten  elements  were  used.  The  implicit 
variable  elimination  method  reduced  the  eleven  variable  unconstrained 
problem  to  ONE  variable.  A  Lagrange  multiplier  method  would  require  twenty 
-  one  variables  when  the  ten  element  hydrostatic  pressure  variables  are 
added.  Figure  2  shows  the  convergence  of  the  inner  radius  with  respect  to 
the  Newton  -  Raphson  steps.  The  initial  configuration  was  a  poor  guess  but 
after  two  steps  the  log  of  the  reduced  gradient  converged  linearly,  see 
Figure  3.  The  converged  solution  is  the  correct  solution  and  is  shown  in 
Figure  A.  In  addition  we  solved  another  one  dimensional  problem  involving 
the  stretching  of  a  rubber  rod.  Again,  the  reduced  gradient  converged 
linearly. 

It  is  important  to  note  the  difficulties  involved  in  extending  this 
method  to  two  -  dimensional  problems  (our  original  intent).  The 
super lements  suggested  by  Needleman  and  Shih  [11]  are  apparently 
unavoidable.  The  implicit  variable  elimination  method  would  be  very 
attractive  if  the  reduced  gradient  and  tangent  matricies  could  .le  computed 
at  an  element  level.  That  is,  if  internal  element  displacement  variables 
could  be  eliminated  using  the  linearization  of  the  constraint  equations. 
Then,  there  would  be  no  bandwidth  change,  the  constrained  gradient  and 
tangent  would  be  computed  almost  as  quickly  as  the  compressible  case.  This 
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method  of  eliminating  internal  variables  fails  because  the  eliminated 
variables  cannot  be  updated  so  that  the  element  volximes  return  to  their 
original  values.  This  is  due  to  the  interelement  dependence  of  the 
constraint  equations.  One  can  carefully  identify  a  set  of  global  variables 
which  can  be  eliminated  and  updated,  etc.  but  a  system  of  nonlinear 
equations  must  be  solved  at  each  Newton  -  Raphson  iteration  to  exactly 
enforce  incompressibility. 
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Iterat ion 

Figure  3.  Log  (reduced  gradient)  vs  Newton  -  Raphson  iteration. 
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ABSTRACT 


Three-dimensional  analyses  nave  been  conducted  of  elastic-plastic  con- 
tinua  which  contain  pairs  of  spherical  particles  and  voids.  Response  to 
shear  loading  was  investigated  with  the  intention  of  characterizing  stress 
states  at  the  microstructural  level  which  result  in  void  nucleation  and 
softening,  leading  to  shear  strain  localization  in  ultra-high  strength 
steels. 


INTRODUCTION.  There  is  a  great  deal  of  evidence  that  ductile  fracture  of 
metallic  alloys  stems  from  the  nucleation  of  voids  at  second  phase  micro- 
structural  particles.  Nucleation  occurs  when  either  critical  conditions  at 
the  interface  are  achieved,  or  when  the  strength  of  the  particle  is  reached, 
causing  a  fracture  of  the  particle.  Either  event  produces  local  crack  damage 
which  deforms  into  a  void  as  the  plastic  deformation  of  the  sample  proceeds. 
Plasticity  theory  has  been  applied  to  the  case  of  void  deformation  in  the 
presence  of  triaxial  tension,  and  results  have  demonstrated  that  the  void 
surface  can  experience  strain  levels  far  in  excess  of  nominal  values  when  the 
mean  stress  is  above  yield  stress  levels.  Rice  and  Tracey  (1).  Consequently, 
the  voids  grow  and  the  material  progressively  weakens  as  neighboring  voids 
coalesce  by  impingement  in  such  stress  environments.  Gurson  (2)  has  devel¬ 
oped  a  plasticity  constitutive  theory,  including  yield  criterion  and  flow 
rule,  to  represent  materials  which  dilate  from  the  void  growth  mechanism. 

This  theory  is  most  properly  applied  to  cases  involving  significant  regions 
of  high  triaxial  tension. 

When  the  mean  stress  is  low  compared  to  the  yield  stress,  the  "void 
sheet"  mechanism  of  internal  damage  is  commonly  observed  in  planes  of  maximum 
shear,  Rogers  (3).  This  may  involve  nucleation  from  different  size  scale 
populations  of  particles.  For  instance,  pairs  of  relatively  large  voids 
nucleated  from  grain  refinement  particles  might  elevate  the  stress  and  strain 
fields  locally  to  nucleate  a  number  of  smaller  voids  from  strengthening 
particles  between  pairs.  Coalescence  would  occur  by  cracking  of  ligaments 
after  a  critical  spacing  is  achieved. 

In  this  report  three-dimensional  elastic-plastic  results  are  given  for 
the  stress  and  strain  fields  that  develop  near  void  and  particle  pairs.  The 
matrix  material  has  been  modeled  as  a  non-hardening  elastic-plastic  metal, 
while  the  particles  are  considered  to  be  elastic  with  a  modulus  twice  that  of 
steel.  The  results  vividly  demonstrate  how  nominally  uniform  shear  condi¬ 
tions  are  perturbed  near  interacting  inhomogeneities.  Comparisons  with  plane 
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strain  solutions  are  made  and  these  demonstrate  the  importance  of  including 
three-dimensional  effects  into  micromechanical  computer  simulations. 

The  analyses  modeled  a  sample  of  metal,  nominally  under  uniform  shear 
loading,  containing  one  inhomogeneity  pair  (either  a  pair  of  voids  or  a  pair 
of  particles)  buried  within  the  sample  far  from  its  boundaries.  Spheres 
placed  at  a  distance  of  three  diameters  is  the  pair  configuration  we  have 
limited  our  discussion  to  in  this  paper.  Two  separate  orientations  of  the 
pair  with  respect  to  the  direction  of  applied  shear  were  considered,  as 
illustrated  in  Figure  1.  As  shown  in  the  top  quarter  section,  one  orienta¬ 
tion  has  the  applied  shear  directed  parallel  to  the  pair  centerline.  The 
bottom  quarter  section  illustrates  the  other  orientation  which  has  the 
applied  shear  directed  perpendicular  to  the  centerline. 


NUMERICAL  FORMULATION.  A  finite  element  formulation  was  employed  in  the 
study  to  ascertain  fully  plastic  solutions  within  the  small  strain  theory  of 
non-hardening  plasticity.  These  solutions  can  be  used  to  approximate  the 
conditions  that  would  prevail  near  interacting  voids  and  particles  at  the 
point  of  incipient  flow  localization  on  the  macroscale.  Not  considered  here 
are  solutions  representing  conditions  of  large  deformation  which  develop 
after  localization  has  initiated. 

Specifically,  an  incremental  elastic-plastic  finite  element  formulation 
was  used.  The  fully  plastic  solution  which  provides  the  local  flow  field  of 
interest  is  achieved  numerically  by  incrementally  tracing  the  loading  param¬ 
eters  (boundary  displacement  here,  as  described  below)  from  the  initial 
unstressed  state.  The  approach  consists  of  approximating  the  undetermined 
displacement  rate  field  with  standard  piecewise  defined  finite  element  inter¬ 
polations.  The  primary  discrete  variables  are  nodal  displacement  rates 
(increments)  which  are  determined  at  each  step  of  loading. 

To  achieve  the  desired  uniform  remote  strain  state,  boundary  nodes  were 
constrained  to  displace  according  to  the  specified  state.  These  constraints 
were  imposed  at  each  increment  and  magnitudes  were  maintained  in  fixed  pro¬ 
portions.  If  these  specified  displacement  increments  are  denoted  u.  through 
u  ,  the  matrix  equation  for  the  vector  of  undetermined  values  £  is  ^iven  by 


In  this  equation,  K  is  the  constrained  stiffness  matrix  and  K-  are  stiffness 
columns  correspondTng  to  the  specified  components.  The  stifTness  terms  vary 
according  to  the  position  of  the  elastic-plastic  boundary  and  stress  state 
(flow  rule).  An  implicit  scheme  is  used  at  each  step  to  average  the  flow 
rule  at  each  position  within  the  plastic  zone.  The  load  history  is  discre¬ 
tized  through  an  adaptive  incrementation  procedure  discussed  by  Tracey  and 
Freese  (4,  5).  The  planar  and  three-dimensional  versions  of  this  formulation 
are  embodied  in  the  MTL  FORTRAN  code  EPFE  which  was  utilized  in  this  study. 

The  treatment  of  a  pair  of  inhomogeneities  can  be  contrasted  with  formu¬ 
lations  which  have  considered  periodic  arrangements.  Figure  2  illustrates 
typical  idealizations  used  in  plane  strain  and  axisymmetric  analyses.  The 
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pair  model  would  appear  to  allow  a  more  realistic  assessment  of  interaction 
in  actual  microstructures.  If  loadings  are  restricted  to  tractions  perpen¬ 
dicular  to  the  model  (unit  cell)  boundaries,  discretization  requirements  are 
essentially  the  same  for  the  periodic  arrangements  and  the  pair  configura¬ 
tion.  Results  presented  below  suggest  limited  usefulness  of  the  cylindrical 
geometry  and  that  the  spherical  geometry  should  be  modeled  instead.  The 
axisymmetric  formulation  displayed  in  the  bottom  of  Figure  2  treats  spheres 
but  suffers  from  the  requirement  that  the  centerline  of  the  spheres  must  be  a 
principal  stress  direction.  The  three-dimensional  pair  model  employed  in 
this  work  allows  treatment  of  any  applied  loading,  including  the  cases  of 
interest  which  have  the  centerline  in  the  plane  of  maximum  shear. 

In  Figure  1  if  the  coordinate  axes  are  centered  between  the  spheres,  the 
planes  x=y=z=0  then  serve  to  define  planes  of  reflective  symmetry  of  the 
model.  Geometrically  the  total  region  can  be  viewed  as  an  assembly  of  eight 
identical  subregions,  each  containing  a  single  quarter  sphere.  The  regions 
displayed  in  Figure  1  are  unions  of  two  of  these  elemental  subregions. 
Actually,  only  an  interior  subregion  is  displayed.  The  total  region  had 
dimensions  13x10x10  relative  to  the  sphere  diameter  D.  By  noting  conditions 
of  skew  anti-symmetry  in  simple  shear,  it  was  possible  to  perform  the  analy¬ 
sis  by  discretizing  a  single  subregion  (octant)  of  the  total  model. 

If  the  entire  region  were  to  be  modeled,  the  simple  shear  state  would  be 
enforced  in  the  top  problem  of  Figure  1  in  the  following  way.  The  two  yz 
boundary  faces  would  have  the  x  displacement  varying  linearly  with  y,  and  on 
these  faces  the  y  component  of  displacement  would  be  zero.  The  xz  faces 
would  have  a  constant  value  for  the  x  displacement  and  a  zero  value  of  y 
displacement.  The  z  component  of  traction  would  be  zero  on  these  four  faces, 
corresponding  to  zero  valued  xz  and  yz  shear  stresses.  Finally,  the  xy  faces 
would  be  completely  traction  free. 

When  the  skew  anti -symmetry  conditions  are  invoked  on  the  planes  of 
geometric  symmetry,  the  following  boundary  conditions  produce  the  state  of 
nominal  simple  shear.  In  the  top  problem,  on  x=0  the  y  component  of  dis¬ 
placement  as  well  as  the  x  and  z  components  of  traction  are  zero.  On  y=0, 
the  X  component  of  displacement  and  the  y  and  z  traction  components  are  zero. 
Finally,  on  z=0  the  z  displacement  and  x  and  y  tractions  are  zero.  Similar 
conditions  can  be  applied  to  the  faces  of  the  elemental  octant  in  the  bottom 
problem  where  the  applied  shear  is  directed  perpendicular  to  the  centerline. 

The  finite  element  mesh  used  over  the  octant  consisted  of  constant 
strain  tetrahedra.  The  mesh  was  generated  by  first  developing  a  field  of 
eight-node  brick  elements  which  were  individually  subdivided  into  five  tetra¬ 
hedra.  The  mesh  refinement  was  different  in  the  analyses  of  the  two  void 
problems.  The  case  of  parallel  shear  had  a  mesh  consisting  of  4500  elements 
and  1200  nodes,  each  with  three  degrees  of  freedom.  The  perpendicular  shear 
analysis  was  more  refined  in  that  there  were  7100  elements  and  1800  nodes  in 
the  mesh.  The  analysis  of  the  pair  of  particles  was  conducted  using  the 
refined  mesh  for  both  load  orientations.  The  additional  complexity  in  the 
particle  analysis  involved  discretization  of  the  particles  themselves.  The 
quarter  particle  appearing  in  the  octant  was  represented  by  1300  elements  to 
give  a  total  mesh  of  8400  elements  and  2000  nodes. 
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VOID  PAIR  INTERACTIONS.  The  elastic  solution  for  an  isolated  spherical  void 
in  simple  shear  has  been  described  by  Love  (6).  Referring  to  Figure  1,  the 
maximum  stress  occurs  at  the  two  points  on  the  void  surface  on  the  xz  plane 
with  tangent  in  the  direction  of  applied  shear.  For  a  Poisson's  ratio  of 
0.3,  the  stress  concentration  factor  at  these  locations  is  1.91,  suggesting 
that  void  surface  yielding  should  commence  when  the  nominal  shear  level 
equals  1/1.91=0.52  times  the  material's  yield  strain  in  shear. 

Four  stages  of  the  elastic-plastic  solution  are  illustrated  in  Figure  3 
for  a  pair  of  spherical  voids  spaced  at  a  distance  of  three  diameters  under  a 
remote  shear  directed  parallel  to  the  centerline.  Plastic  zones  are  repre¬ 
sented  in  a  quadrant  by  regions  consisting  of  tetrahedron  elements  which  have 
met  the  yield  condition  at  the  load  level  indicated.  As  anticipated  from  the 
classical  elasticity  solution,  yielding  first  occurs  in  this  quadrant  at  the 
void  surfaces  90°  from  the  pair  centerline  in  the  xz  plane.  As  load  is 
increased,  plasticity  spreads  from  these  locations.  In  the  top  left,  corre¬ 
sponding  to  a  remote  strain  0.80  times  the  yield  strain,  most  of  the  void 
surfaces  have  yielded,  but  there  is  no  plasticity  between  voids.  Significant 
yielding  between  the  voids  has  occurred  at  94%  of  yield,  as  demonstrated  in 
the  bottom  left.  At  96%  of  yield,  bottom  right,  the  separate  plastic  zones 
have  merged,  leading  the  way  for  a  mechanism  of  extensive  plastic  straining 
between  voids. 

The  strain  intensification  that  occurs  along  the  centerline  of  the  void 
pair  is  summarized  in  Figure  4.  Data  are  plotted  for  the  two  spherical  void 
pair  problems  and  also  for  the  cylindrical  void  pair  problem  that  has  been 
discussed  by  Tracey,  Freese,  and  Perrone  (7).  These  problems  are  individu¬ 
ally  considered  in  the  two  top  and  the  bottom  left  plots  of  Figure  4.  The 
results  of  the  three  problems  are  contrasted  in  the  bottom  right  plot  which 
has  peak  local  strain  plotted  against  nominal  strain  level. 

The  component  of  strain  that  is  plotted  for  each  case  corresponds  to  the 
nominal  simple  shear  state,  e.g.  yz  component  for  the  top  left  problem.  The 
data  are  presented  relative  to  the  material's  yield  strain  in  shear.  The 
distributions  along  the  centerline  are  plotted  for  x/D  values  from  0.5  to 
2.5,  which  corresponds  to  the  distance  between  void  surfaces. 

When  the  applied  shear  is  directed  perpendicular  to  the  pair  centerline 
(top  left),  the  centerline  strain  maxima  occur  on  the  void  surfaces.  The 
results  for  incipient  yield  (nominal  strain  =  0.49  times  yield  strain)  demon¬ 
strate  the  extremely  localized  effects  of  inhomogeneities  in  an  elastic 
field.  As  can  be  seen,  the  elastic  solution  has  the  strain  elevated  over  the 
nominal  value  only  within  a  distance  of  one  void  radius  from  the  void  sur¬ 
faces.  Hence,  there  is  effectively  no  interaction  in  the  elastic  pair  prob¬ 
lem  witn  a  center  spacing  of  three  diameters.  Consistent  with  Love's  (6) 
isolated  void  result,  the  strain  maxima  are  approximately  twice  the  nominal 
value  before  plastic  yielding  intervenes.  At  general  yield,  the  strain 
maxima  have  increased  to  about  three  times  the  nominal  value  and  interaction 
is  evident  with  mid-cencerl i ne  strain  magnitudes  significantly  exceeding  the 
nominal  value. 

The  analysis  of  the  spherical  void  pair  with  shear  parallel  to  the 
centerline  was  conducted  using  a  mesh  that  was  too  coarse  to  adequately 
capture  the  shear  free  condition  which  holds  at  x=0.5  0  and  2.5  0. 
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Nonetheless,  the  character  of  the  elastic-plastic  solution  is  thought  to  be 
reasonably  represented  in  the  top  right  plot.  As  in  the  other  case,  the 
elastic  solution  shows  strain  variations  only  within  one  radius  of  the  void 
surfaces,  with  the  nominal  strain  value  realized  over  the  middle  half  of  the 
span  between  the  voids.  At  general  yield,  the  strain  exceeds  the  nominal 
value  over  the  entire  ligament.  The  plot  shows  a  modest  peak  at  roughly  3/4 
of  a  radius  from  the  surfaces  and  strain  levels  roughly  30%  over  the  nominal 
strain. 

The  cylindrical  void  pair  analysis  shows  strain  amplification  levels 
greatly  exceeding  those  found  in  the  spherical  void  analyses.  For  this  plane 
strain  case,  the  mesh  refinement  was  adequate  to  capture  the  shear  free 
conditions  at  x=0.5  D  and  2.5  D.  The  elastic  solution  shows  a  strong  gradi¬ 
ent  out  to  a  distance  roughly  1/2  of  a  radius  from  the  surfaces,  otherwise 
reaching  a  near  uniform  state  between  the  voids.  Interaction  is  evident  in 
this  problem  even  in  the  elastic  regime  with  the  strain  between  voids  approx¬ 
imately  50%  over  the  nominal  value  before  yielding  occurs.  Plastic  zones 
develop  at  the  void  surfaces  and  separately  in  the  center  of  the  ligament  for 
this  problem.  When  these  zones  link,  distinct  strain  maxima  develop  at 
positions  roughly  3/4  of  a  void  radius  from  the  surfaces.  The  strain  inten¬ 
sification  is  seen  to  increase  in  severity  as  general  yield  conditions  are 
approached. 

The  three  solutions  are  compared  in  the  bottom  right  plot  of  Figure  4. 
Curves  show  the  variation  of  local  peak  strain  for  each  case  as  a  function  of 
nominal  strain  level.  Of  the  two  spherical  pair  cases,  it  can  be  seen  that 
the  orientation  perpendicular  to  the  load  induces  the  highest  local  strains. 
Nonetheless,  a  comparison  of  the  top  left  and  right  plots  shows  that  the 
strain  level  attained  in  the  middle  of  the  centerline  is  essentially  inde¬ 
pendent  of  orientation.  The  strain  magnitudes  found  in  the  cylindrical  void 
pair  case  are  intermediate  to  the  spherical  pair  results  when  plastic  zone 
size  is  small.  As  can  be  seen,  at  approximately  75%  of  general  yield  corre¬ 
sponding  to  extensive  local  yielding,  the  peak  strain  values  begin  to  take  on 
values  exceeding  those  found  in  the  spherical  pair  cases.  At  general  yield 
the  local  strain  and  strain  rates  for  this  case  greatly  exceed  the  values 
found  for  the  spherical  void  problems. 

It  is  the  strain  rate  field  that  is  most  useful  in  assessing  the  local 
intensification  of  the  nominal  state  once  general  yield  conditions  are 
achieved.  Before  general  yield  this  field  continually  changes,  as  the  plas¬ 
tic  zones  change,  but  thereafter,  within  the  small  deformation  and  nonharden¬ 
ing  assumptions,  the  field  remains  constant  relative  to  the  nominal  value. 
Figure  5  illustrates  contours  of  shear  strain  rate  (normalized  by  nominal 
strain  rate)  for  the  perpendicular  loading.  Results  are  given  for  nominal 
strain  levels  before  (.94)  and  after  (1.03)  general  yield.  The  contours  are 
drawn  over  a  quadrant  of  the  xz  midplane  of  the  model.  It  is  apparent  that 
the  maximum  strain  rate  occurs  in  each  case  at  the  point  of  strain  concentra¬ 
tion  at  the  void  surface  on  the  centerline. 

At  the  lower  load  level,  the  maximum  strain  rate  is  approximately  4.3 
times  the  nominal  value.  The  gradient  is  steep,  with  the  middle  of  the 
centerline  experiencing  a  modest  value  of  approximately  1.5.  Little  interac¬ 
tion  is  apparent  at  this  nominal  strain,  as  the  4.3  value  holds  on  the  oppo¬ 
site  side  of  the  void  surface  as  well  as  at  the  surface-centerline 
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intersection  point.  At  the  higher  load  level,  interaction  is  suggested  by  a 
maximum  of  10.0  and  the  somewhat  lower  value  8.7  across  the  surface.  In  this 
case,  the  mid-centerline  strain  rate  is  4.5  times  the  nominal  value,  indicat¬ 
ing  significant  elevated  strain  rates  along  the  entire  centerline  after 
general  yield  is  achieved. 


PARTICLE  PAIR  INTERACTION.  The  void  pair  analyses  obviously  have  neglected 
the  presence  of  nucleating  particles,  and  thus  are  applicable  to  the  study  of 
post  nucleation  effects  resulting  from  the  creation  of  interior  traction  free 
surfaces.  The  field  near  perfectly  bonded  elastic  spherical  particles  was 
studied  by  performing  an  elastic-plastic  finite  element  analysis  which  mod¬ 
eled  particles  as  elastic  with  infinite  yield  strength  and  a  modulus  twice 
that  of  the  elastic  modulus  of  the  elastic-plastic  matrix  in  which  they 
reside.  As  in  the  void  pair  analyses,  a  particle  pair  with  a  three  diameter 
spacing  was  considered. 

The  strain  intensification  is  plotted  in  Figure  6  for  the  two  loading 
orientations  considered  above.  Curves  display  the  strain  distribution 
through  the  particles  and  along  the  centerline  between  them.  In  these  prob¬ 
lems  incipient  yield  was  found  to  occur  at  a  nominal  shear  strain  equal  to 
77%  of  the  shear  yield  strain.  Eshelby's  (8)  analysis  of  isolated  ellip¬ 
soidal  particles  in  elastic  fields  demonstrated  a  uniform  strain  state  within 
particles.  Tl.e  finite  element  results  displayed  in  Figure  6  agree  with  this 
result  and  have  a  near  uniform  state  within  the  particles  even  after  exten¬ 
sive  matrix  yielding  has  occurred.  At  incipient  yield  the  shear  strain  of 
the  particles  is  approximately  50%  of  the  shear  yield  strain  of  the  matrix, 
consistent  with  the  difference  in  elastic  moduli. 

In  the  left  plot  of  Figure  6,  for  the  case  of  orientation  perpendicular 
to  the  shear  load,  it  can  be  seen  that  the  distribution  is  continuous  across 
the  particle/matrix  interface.  The  nominal  value  of  shear  strain  is  reached 
at  the  middle  of  the  centerline.  For  this  orientation,  there  is  essentially 
no  strain  intensification  over  the  nominal  value  along  the  void  pair  center- 
line. 


The  right  plot  of  Figure  5  displays  the  shear  strain  intensification  for 
the  case  of  particles  oriented  in  the  direction  of  the  applied  load.  At 
incipient  yield  the  magnitude  of  strain  in  the  particles  is  slightly  higher 
than  50%  of  the  shear  yield  strain  of  the  matrix  and  at  a  load  slightly 
greater  than  general  yield,  the  magnitude  is  approximately  70%  of  the  shear 
yield  strain.  Across  the  interface  the  shear  strain  is  discontinuous  and 
jumps  from  a  subnominal  value  in  the  particle  to  the  maximum  value  found  in 
the  matrix.  At  incipient  yield,  this  maximum  value  of  strain  is  equal  to  the 
shear  yield  strain  of  the  matrix.  The  severe  gradient  shows  a  decrease  to 
the  nominal  value  of  shear  strain  within  one  half  of  a  particle  radius  into 
the  matrix.  As  loading  progresses,  the  shear  strain  rate  intensifies  on  the 
matrix  side  of  the  interface  corresponding  to  the  occurrence  of  extensive 
plastic  deformation. 


SUMMARY .  Results  have  been  presented  for  the  three-dimensional  aspects  of 
interaction  of  pairs  of  voids  and  particles  in  shear.  While  the  work  has 
been  motivated  by  metallurgical  needs,  particularly  the  need  to  develop 
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microstructures  for  the  delay  of  void  nucleation,  clearly,  much  remains  to  be 
done  to  guide  alloying  from  a  mechanics  basis.  Future  work  on  pair  interac¬ 
tion  in  shear  must  address  void  nucleation,  the  spacing  issue  and  a  more 
complete  assessment  of  orientation  effects.  Ultimately,  the  goal  is  to 
consolidate  the  simulation  features,  so  that  the  necessary  data  and  methodol¬ 
ogy  will  be  available  to  allow  microstructural  design  for  ultra-high  strength 
and  toughness. 
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Figure  1.  Quarter  section  of  inner  region  containing  void/ 
particle  pair  under  far  field  simple  shear  loading.  In  top 


drawing,  shear  load  is  directed  parallel  to  pair  centerline. 


Bottom  drawing  has  shear  load  perpendicular  to  centerline. 
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Figure  2.  Unit  cells  (shaded)  for  two-dimensional  analysis  of  interaction 
in  periodic  void  arrangements  and  biaxial  loading.  Top  and  middle:  circular 
cylindrical  voids  in  plane  strain,  bottom:  spherical  voids  under  axisymmetric 
loading. 


FIGURE  3.  Plastic  zone  grovrth  near  spherical  void  pai 
before  general  yield  conditions  are  achieved  for  shear 
parallel  to  centerline  of  voids. 


t'Ontour  plots  of  shear  strain  rate  on  midplane  quadrant  shaded  In  sketch,  for  voids 
oriented  perpendicular  to  shear  loading.  Results  are  at  nominal  strain  levels  of  0.94  Cleft) 
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ABSTRACT .  Swage  autofrettage  process  is  often  used  to  produce  favorable 
residual  stresses  in  the  tube.  In  this  paper  a  finite  element  analysis  of  the 
swage  autofrettage  process  is  reported.  The  nonlinear  finite  element  program 
(ABAQUS)  is  used  to  obtain  numerical  results  for  the  displacements,  strains,  and 
stresses  in  the  tube  during  and  after  autofrettage.  Approximate  solutions  are 
obtained  for  one-  and  two-dimensional  tubes  pressed  by  rigid  or  elastic  mandrel. 
The  longitudinal  effect  and  the  elasticity  of  the  mandrel  on  the  permanent  bore 
enlargement  and  the  residual  stresses  are  discussed. 

INTRODUCTION.  To  increase  the  maximum  elastic  carrying  capacity  and  to 
enhance  the  fatigue  life,  residual  stresses  are  often  produced  in  tubes  through 
autofrettage  [1].  Many  solutions  are  reported  for  the  hydraulic  autofrettage 
process.  The  thick-walled  cylinders  are  subjected  to  uniform  internal  pressure 
of  sufficient  magnitude  to  cause  plastic  deformation  and  then  the  pressure  is 
removed . 

A  more  economical  way  of  producing  residual  stresses  in  thick-walled  cylin¬ 
ders  is  the  swage  autofrettage  process.  This  process  is  carried  out  by  a 
mandrel,  the  diameter  of  which  is  greater  than  the  inner  diameter  of  the  tube. 
The  mandrel  is  driven  through  the  tube  from  one  end  to  the  other.  A  rigorous 
analysis  of  this  process  is  difficult.  Recently  a  simple  analysis  of  the  swage 
autofrettage  process  was  reported  [2].  The  model  used  was  a  one-dimensional 
plane-strain  problem  of  mandrel-tube  assembly.  The  steel  tube  was  assumed  to  be 
elastical ly-ideal ly  plastic,  obeying  Tresca's  yield  criterion  and  the  associated 
flow  theory,  but  the  tungsten  carbide  mandrel  was  elastic.  The  deformation  and 
stress  distribution  during  swaging  were  obtained  by  solving  the  shrink-fit 
problem  beyond  the  elastic  limit.  After  swaging,  the  permanent  bore  enlargement 
and  residual  stresses  were  calculated  by  an  unloading  analysis  [2],  taking  into 
account  the  Bauschinger  effect  and  the  strain-hardening  during  unloading  [3]. 

The  solution  reported  in  [2]  is  in  closed-form  and  the  numerical  results 
indicate  that  the  agreement  between  the  calculated  and  experimental  data  is 
excellent  in  zones  with  larger  wall  ratios  but  not  so  good  in  zones  with  wall 
ratios  less  than  two.  The  differences  in  thinner  sections  may  be  due  to  the 
longitudinal  bending  effect  since  the  simplified  analytical  analysis  is  one¬ 
dimensional  and  bending  is  neglected.  In  order  to  determine  the  longitudinal 
effect,  a  two-dimensional  analysis  based  on  the  finite  element  method  is  made. 

In  this  paper,  the  finite  element  solutions  are  presented  for  both  one-  and  two- 
dimensional  models  and  a  comparison  of  the  results  is  given. 
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METHOD  OF  ANALYSIS.  Since  the  total  length  of  the  tube  is  about  sixty 
times  the  diameter  of  the  mandrel,  a  complete  finite  element  analysis  of  the 
swage  autofrettage  process  is  very  difficult.  As  the  mandrel  is  driven  through 
the  tube  from  one  to  the  other,  the  simulation  requires  the  study  of  elastic- 
plastic  moving  contact  and  separation  history  between  two  deforming  bodies.  In 
addition,  a  considerable  amount  of  computer  storage  and  run  time  would  be 
required.  In  the  present  study,  however,  approximate  finite  element  models  are 
chosen  to  represent  swaging  in  only  a  part  of  the  tube  (zone  3).  We  consider 
the  process  as  quasi-static  and  neglect  the  effect  of  sliding  and  friction 
between  the  mandrel  and  tube.  We  want  to  obtain  the  information  about  the 
deformations  and  stresses  for  a  section  at  only  two  particular  stages,  i.e., 
when  the  mandrel  is  at  or  far  away  from  the  position  of  interest.  To  achieve 
this  purpose,  we  can  simplify  the  simulation  by  studying  two  related  problems, 
i.e.,  shrink-fit  and  complete  unloading.  When  the  mandrel  is  at  the  position  of 
interest,  we  consider  a  shrink-fit  problem  of  the  mandrel-tube  assembly  to 
obtain  the  maximum  deformation  and  stresses  during  swaging.  When  the  mandrel  is 
driven  far  away  from  the  section,  we  study  it  as  a  complete  unloading  problem  of 
the  mandrel-tube  assembly  to  obtain  the  information  about  the  permanent  bore 
enlargement  and  residual  stresses  after  swaging.  Figure  1  shows  a  one¬ 
dimensional  interference-fit  problem  of  the  mandrel-tube  assembly.  Initially, 
the  inner  and  outer  radii  of  the  mandrel  is  c.  Given  the  interference  I  =  c-a, 
we  can  determine  the  interference  pressure  p  and  the  deformation  and  stresses  in 
the  mandrel  and  tube.  In  general,  this  problem  can  be  solved  only  by  an  itera¬ 
tive  approach.  If  the  mandrel  were  rigid,  then  the  direct  approach  using 
displacement  constraints  could  be  applied.  The  results  based  on  this  approach 
were  obtained  so  we  could  discuss  the  effect  of  elasticity  in  the  mandrel.  The 
actual  strength  ratio  of  tungsten  carbide  to  steel  is  about  three.  For  the 
problem  considered  here,  it  is  reasonable  to  assume  that  the  steel  tube  is 
elastic-plastic,  obeying  Mises'  yield  criterion  and  the  associated  flow  theory, 
but  the  tungsten  carbide  mandrel  remains  elastic.  The  finite  element  analysis 
is  carried  out  by  using  the  nonlinear  program,  A6AQUS  [4].  Two  types  of  ele¬ 
ments  used  are  shown  in  Figure  2.  The  axisymmetric  solid  elements  (CAX4)  are 
used  to  model  the  tube  and  mandrel.  The  interface  elements  (INTER2A)  are  used 
to  model  the  separation  or  interference  fit  between  the  mandrel  and  tube.  Truss 
elements  (CID2)  can  also  be  used  to  model  the  mandrel  because  the  displacement 
Uj  is  directly  related  to  the  external  pressure  p  by 

Uj/c  =  -(l-yi-2i;i*)p/E 

where  Ej,  uj  are  elastic  constants  of  the  mandrel. 

FINITE  ELEMENT  MODEL.  Figure  3  shows  a  two-dimensional  finite  element 
model  (E3)  chosen  to  represent  the  swaging  process  in  zone  3.  The  model  is  con¬ 
sidered  symmetric  with  respect  to  z  =  0  so  that  only  half  of  the  model  is  shown. 
We  have  used  133  and  21  elements  of  type  CAX4  to  represent  the  tube  and  mandrel , 
respectively,  with  a  =  1,  b  =  1.431,  c  =  1.007415.  There  are  eight  interface 
elements  of  type  INTER2A  to  represent  the  interaction  between  the  tube  and 
mandrel.  Figure  3a  shows  an  interference-fit  problem  of  the  mandrel-tube 
assembly.  This  model  is  used  to  determine  the  maximum  deformation  and  stresses 
during  swaging.  Figure  3b  shows  a  complete  unloading  problem  when  the  two  parts 
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are  separated.  This  problem  is  used  to  determine  the  permanent  deformation  and 
residual  stresses  after  swaging.  In  order  to  determine  the  longitudinal  bending 
effect,  we  would  like  to  compare  the  two-dimensional  analysis  with  the  one¬ 
dimensional  analysis.  The  one-dimensional  model  (El)  consists  of  ten  elements 
(of  type  CAX4)  each  for  the  tube  and  mandrel  with  one  interface  element  (of  type 
INTER2A).  Another  one-dimensional  model  (E2)  is  to  replace  the  mandrel  by  one 
or  two  Truss  elements  of  type  CID2.  The  material  constants  used  are  E  s  200,  Oq 
=  1,  V  s  0.3  for  the  high  strength  steel  and  Ej  »  590,  Oj  =  3.33,  uj  =  0.258  for 
the  tungsten  carbide.  The  materials  exhibit  no  strain-hardening.  In  the 
modeling  and  computation  we  have  used  the  dimensionless  quantities  with  the 
inner  radius  (2.283  inches)  as  the  unit  length  and  the  initial  yield  stress  (150 
Ksi)  as  the  unit  stress.  The  actual  quantities  can  be  obtained  easily  if 
needed . 

In  the  above  three  models  (El,  E2,  E3),  the  tube  is  elastic-plastic,  but 
the  mandrel  remains  elastic.  If  the  strength  ratio  of  the  mandrel  material  to 
tube  material  is  very  large,  then  the  mandrel  can  be  regarded  as  rigid.  In 
order  to  determine  the  effect  of  elasticity  in  the  mandrel,  we  have  chosen  three 
finite  element  models  (Rl,  R2,  R3).  Models  R1  and  R2  represent  one-dimensional 
plane-strain  and  plane-stress  cases,  respectively.  We  have  used  ten  elements  of 
type  CAX4  to  represent  the  tube.  The  model  R3  is  the  same  as  the  model  E3  shown 
in  Figure  3  except  that  the  mandrel  is  replaced  by  a  rigid  block. 

Following  the  instructions  given  in  Reference  [4],  we  have  prepared  the 
input  data  for  each  of  the  six  finite  element  models.  For  each  model  we  ran  the 
problem  in  two  steps,  i.e.,  loading  and  unloading.  The  input  deck  for  the 
finite  element  analysis  of  model  El  is  shown  in  Table  1. 

RESULTS  AND  DISCUSSIONS.  For  each  of  the  six  models  (Rl,  R2,  R3,  El,  E2, 
E3)  discussed  in  the  preceding  section,  we  have  run  the  finite  element  program 
successfully.  The  numerical  results  for  the  displacements,  strains,  and 
stresses  in  the  tube  during  and  after  swaging  have  been  obtained.  Only  the 
results  for  the  stresses  along  the  radial  direction  near  z  =  0  and  the  displace¬ 
ments  along  the  bore  are  presented  graphically. 

When  the  mandrel  is  assumed  to  be  rigid,  the  displacement  at  the  bore  is 
equal  to  the  given  interference.  The  results  for  the  stresses  based  on  models 
(Rl,  R2,  R3)  are  presented  in  Figures  4  through  6.  When  the  interference  is 
only  half  of  the  maximum,  the  state  of  stresses  remains  elastic  as  shown  in 
Figures  4  and  5.  When  the  maximum  interference  (I  =  0.007415)  is  reached,  the 
state  of  stresses  is  elastic-plastic.  The  effect  of  interference  on  the  distri¬ 
butions  of  hoop  and  axial  stresses  can  be  seen  in  Figures  4  and  5,  respectively. 
By  comparing  the  results  for  model  Rl  (one-dimensional,  plane-strain  case)  and 
model  R3  (two-dimensional  case),  we  can  also  see  the  influence  of  the  longitudi¬ 
nal  effect  on  the  hoop  and  axial  stresses.  The  influence  on  the  maximum  axial 
stresses  is  very  significant  as  shown  in  Figure  5.  Unloading  after  the  maximum 
interference  is  reached,  we  have  obtained  the  residual  stresses  as  shown  in 
Figures  5  and  6  for  the  axial  and  hoop  stresses.  A  comparison  of  these  residual 
stresses  indicates  that  the  differences  between  one-  and  two-dimensional  models 
(Rl  and  R3)  are  very  minor.  Models  Rl  and  R2  represent  plane-strain  and  plane- 
stress  cases,  respectively,  and  both  models  are  one-dimensional. 
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TABLE 


1.  THE  FINITE  ELEMENT  INPUT  DECK  FOR  MODEL  El 
^HEADING 

TUBE-MANDREL  ASSEMBLY  AND  SEPARATION 
*NODE 

1,. 

II, 1.007415 

21,1.0 

31.1.431 

101..  ,  0.05 

III, 1.007415,  0,05 

121,1.0  ,  0.05 

131.1.431  ,  0.05 

*NGEN,NSET*SIDE1 

1.11 

101,111 

*NGEN,NSET=SIDE2 

21,31 

121,131 

*NSET,NSET=BORE 

1,101 

★ELEMENT, TYPE=CAX4 
1,1,2,102,101 
11.21,22.122.121 
★ELGEN , ELSET*MANDREL 
11,10 

*ELGEN,ELSET=TUBE 

11,10 

★SOLID  SECTION , ELSET=MANDREL , MATERI AL=CARBIDE 

★MATERIAL , NAME=CARBIDE 

★ELASTIC 

590..  .258 
★PLASTIC 
3.33 

★SOLID  SECTION, ELSET*TUBE,MATERIAL=STEEL 
★MATERIAL , NAME=STEEL 
★ELASTIC 
2.E2,  .3 
★PLASTIC 
1. 

★ELEMENT , TYPE=INTER2A, ELSET=SFIT 
101,111,11,121,21 
★INTERFACE , ELSET=SFIT 
★FRICTION 
.0 

★BOUNDARY 

SIDE1,2 

SIDE2,2 

★STEP,  NLGE0M,CYCLE=10 
★STATIC , PT0L*1 . E-4 , DIRECT 
1.  ,  1. 

★END  STEP 
★STEP.NLGEOM 

★STATIC,PT0L»l.E-4  .DIRECT 

1.. 1. 

★MODEL  CHANGE,  REMOVE 
MANDREL, SF IT 
★END  STEP 
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When  the  mandrel  is  considered  as  elastic,  the  interference-fit  assembly  is 
solved  iteratively.  The  same  results  for  the  one-dimensional  models  (El  and  E2) 
have  been  obtained.  A  comparison  of  two  models  (El  and  Rl)  for  the  hoop 
stresses  during  and  after  swaging  is  shown  in  Figure  7.  The  elasticity  in  the 
mandrel  reduces  the  amount  of  overstrain  from  70  to  60  percent.  The  numerical 
results  for  the  two-dimensional  model  (E3)  are  presented  in  Figures  8  through 
11.  Figure  8  shows  the  distributions  of  hoop  stresses  during  and  after  swaging. 
Figure  9  shows  the  corresponding  distributions  of  maximum  and  residual  axial 
stresses.  Also  shown  in  Figures  8  and  9  are  the  one-dimensional  results  based 
on  model  El.  A  comparison  of  the  results  based  on  models  El  and  E3  can  deter¬ 
mine  the  two-dimensional  effect  on  these  stresses.  In  Figure  10  we  show  the 
results  for  the  radial  stresses  based  on  four  models  (El,  E3,  Rl,  R3).  Finally, 
the  results  based  on  several  models  for  the  radial  displacement  along  the  bore 
are  presented  in  Figure  11.  The  displacements  during  and  after  swaging  are 
represented  by  U  and  U",  respectively.  Also  shown  in  the  figure  is  the  measured 
permanent  bore  enlargement.  By  comparing  the  results  based  on  models  El  and  Rl, 
the  elasticity  effect  gives  a  smaller  value  for  U".  If  we  include  the  two- 
dimensional  effect  with  model  E3,  we  get  a  value  for  U"  even  smaller  than  that 
based  on  the  one-dimensional  model. 
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Figure  1.  One-dimensional  interference-fit  assembly. 


(a)  solid  element  (CAX4)  (b)  interface  element  (INTE.^2A) 

Figure  2.  Axisymmetric  solid  and  interface  elements. 


(a)  interference-fit 


(b)  separation 


Figure  3.  A  two-dimensional  finite  element  model. 
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Figure  10.  A  comparison  of  radial  stresses  for  four  models. 
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Radial  displacements  along  the  bore. 


NONMONOTONIC  STRESS-STRAIN  LAWS: 
BIZARRE  BEHAVIOR  AND  ITS  REPERCUSSIONS 
ON  NUMERICAL  SOLUTIONS* 


Ted  Belytschko  and  David  Lasry 
Department  of  Mechanical  Engineering 
Northwestern  University 
Evanston,  Illinois  60208 


ABSTRACT 

The  properties  of  solutions  with  nonmonotonic  stress-strain  laws  are  described. 
Some  particular  properties  are:  severe  mesh  dependence,  an  apparent  lack  of  convergence, 
and  chaotic  results  for  conver^ng  waves.  These  results  are  parti^ly  explained  by  examining 
a  closed  form  solution  for  a  simple  problem.  It  shows  that  in  a  nonmononic  continuum,  the 
unstable  dynamic  response  localizes  to  a  set  of  measure  zero. 

To  remedy  this  difficulty,  localization  limiters  have  been  introduced  to  provide 
solutions  where  the  deformation  concentrates  in  regions  of  finite  size.  Several  formulations 
of  such  limiters  are  discussed,  with  particular  reference  to  stability  and  computational  issues. 
Various  applications  are  presented. 


1,  INTRODUCTION 

Solutions  of  problems  involving  a  strain-softening  material  law  are  fraught  with 
serious  difficulties,  both  from  mathematical  and  numerical  points  of  view.  In  dynamics, 
these  difficulties  were  illustrated  in  Ref.  [1]  using  a  simple  one-dimensional  wave 
propagation  model:  an  elastic  wave  propagating  in  a  bar  travels  with  a  velocity  proportional 
to  VEt,  where  Ej  is  the  tangent  modulus.  When  Ej  becomes  negative,  as  is  the  case  for  a 
strain-softening  material,  the  wave  cannot  propagate  anymore,  giving  rise  to  what  Freund  [2] 
calls  a  deformation-trapping  phenomenon:  the  deformation  is  trapped  in  a  certain  zone  of  the 
body  and  no  information  can  be  transmitted  to  the  rest  of  the  material.  Bazant  and  Belytschko 
[1]  have  shown  by  a  closed  form  solution  that  strain-softening  in  transient  problems  is 
characterized  by  the  appearance  of  infinite  strains  on  a  set  of  measure  zero.  This  is  reflected 
in  numerical  simulations  by  a  strong  dependency  of  the  results  upon  the  refinement  of  the 
mesh  [3].  When  the  equations  are  discretized,  by  finite  elements  for  example,  the 
deformation  will  localize  in  the  smallest  discrete  cell  of  material  capable  of  representing  that 
set  of  measure  zero,  namely  one  element  in  constant-strain  elements  in  one-dimension  or  a 
one  element- wide  band  in  two  dimensions.  The  problem  of  mesh  dependency  is  not  an 
intrinsically  numerical  one,  but  rather  stems  from  the  more  fundamental  loss  of  strict 
hyperbolicity  of  the  equations  of  motion  [4]  upon  attaining  the  strain-softening  regime. 
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In  statics,  localization  has  been  associated  with  the  loss  of  ellipticity  of  the 
incremental  equilibrium  equations  [5],  and  the  existence  of  a  bifurcation  from  a 
homogeneous  state  of  deformation  into  a  nonhomogeneous  ones  and  the  appearance  of 
multiple  equilibrium  paths.  This  approach  provides  the  orientation  of  the  localization  band 
and  the  critical  load  for  which  localization  may  be  triggered  but  does  not  provide  any  length 
parameter  for  the  subsequent  behavior;  in  this  respect,  it  is  somewhat  different  from  the 
dynamic  case,  where  a  localization  zone  (albeit  reduced  to  a  single  point  or  line)  appears  in 
the  closed  form  solutions  [3].  This  difference  is  reflected  in  the  numerical  simulations  as 
well.  If  a  solid  with  no  imperfections  is  submitted  to  a  homogeneous  state  of  deformation, 
the  numerical  solution  for  a  static  problem  will  follow  that  homogeneous  deformation  path 
even  when  it  becomes  unstable  beyond  the  bifurcation  point,  provided  Ae  machine  precision 
is  sufficient  to  prevent  round-off  error  from  triggering  an  inhomogeneous  mode. 

In  order  to  circumvent  these  difficulties,  the  concept  of  localization  limiters  has  been 
proposed  in  [3,4].  The  essential  idea  of  these  limiters  is  to  change  the  character  of  the 
equations  so  Aat  the  region  of  localization  does  not  degenerate  to  a  set  of  measure  zero.  The 
limiters  proposed  in  [3]  and  [4]  were  respectively  of  two  distinct  types:  integral  limiters 
based  on  nonlocal  constitutive  equations  and  differential  limiters  based  on  higher  order 
derivatives  of  the  strain. 

One  purpose  of  this  paper  is  to  present  analyses  of  the  governing  equations  with  and 
without  limiters  in  one  dimension  and  in  the  case  of  antiplane  motion  in  two  dimensions.  It  is 
shown  that  without  limiters,  the  static  equations  lose  ellipticity  for  strain  softening  materials 
and  nonassociated  plastic  laws,  while  the  dynamic  equations  lose  strict  hyperbolicity.  With 
the  gradient-type  localization  limiter,  the  dynamic  equations  change  from  hyperbolic  to 
parabolic,  which  introduces  a  length  scale. 

It  is  also  shown  in  this  paper  that  the  two  types  of  localization  limiters,  differential 
and  integral,  possess  very  similar  characteristics.  Both  limiters  (1)  exhibit  a  stable  response 
to  short  wavelength  input  and  and  unstable  response  to  long  wavelength,  and  (2)  limit  the 
localization  to  a  width  dependent  strictly  on  the  length  parameter.  It  is  noted  that  even  with 
the  limiter  the  discrete  tangent  stiffness  does  not  maintain  positive  definiteness  and  the 
numerical  difficulties  associated  with  strain-softening  in  local  materials  also  appear. 

T^e  paper  is  organized  as  follows:  Section  2  deals  with  the  relationship  between 
localization  and  change  of  type  in  the  governing  equations.  Section  3  classifies  the  different 
localization  limiters.  In  section  4,  solutions  are  given  for  simple  problems. 


2.  CONDITION  FOR  LOCALIZATION  AND  CHANGE  OF  TYPE 

In  order  to  better  understand  the  difficulties  associated  with  the  localization 
phenomenon  and  the  role  of  the  gradient  localization  limiter,  the  relation  between  the  onset  of 
localization  and  a  change  of  type  of  the  governing  equations  is  investigated  here. 
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2.LChange  of  type  in  statics  and  dynamics 

The  p^ose  of  this  section  is  to  derive  for  a  simple  problem  the  condition  for  the 
onset  of  loc^zation  in  statics  and  dynamics  and  relate  it  to  the  type  of  the  system  of  PDE's 
governing  the  problem. 

For  the  general  three-dimensional  case,  the  equations  of  motion  are  written: 

+  bj  =  p  vj^t  (2.1) 

where  o  is  the  Cauchy  stress  tensor,  v  the  velocity  vector,  p  the  mass  density  and  b  the 
vector  of  body  forces.  Subscript  indices  preceded  by  comma  denote  partial  derivatives.  The 
body  forces  intervene  in  the  governing  equation  only  as  a  forcing  term,  so  we  can  omit  them 
in  the  study  of  the  character  of  the  equations. 

The  constitutive  law  relating  the  stress  and  strain  rates  is  written: 

Oij  =  Cijki  ejci  (2.2) 


where  the  tensor  Cijki  has  minor  symmetries  Cijki  =  Cjiki  =  Cijik- 1  is  the  strain  tensor 
defined  as: 


ekl  =  2-(uk,i+  ui,k) 


(2.3) 


The  equation  will  be  analyzed  for  a  simple  antiplane  shear  problem,  but  the  main 
results  remain  valid  for  the  generd  three-dimensional  case.  The  antiplane  shear  problem  has 
been  studied  for  a  class  of  incompressible  hyperelastic  materials  by  Knowles  [7]  in  statics 
and  by  Freund  et  ai.[8]  and  Toulios  [9]  in  dynamics.  For  this  problem,  the  displacement  and 
stress  fields  are  as  follows: 


ui  =  U2  =  0  ,  U3  =  U3(xi,x2)  ,  ai3  =  ai3  (xi,X2) ,  023  =  023  (xi,x2)  (2.4) 

In  statics,  the  equilibrium  equations  (2.1)  then  reduce  to: 

013,1  +  023,2  =  0  (2.5) 

The  constitutive  law  reads: 


Ol3  =  2Ci3i3  £13  +  2Ci323  £23 

023  =  2C2313  £13  +  2C2323  £23  (2.6) 

We  make  the  simplifying  assumption  that  stress  is  a  single-valued  function  of  strain.  This 
holds  for  elastic-plastic  laws  as  long  as  there  is  no  unloading  at  any  point.  We  can  write: 


439 


(2.7) 


3gl3  3ei3  9gl3  ^£23 
0£13  <?X1  0222  5x1 


2Ci313  ei3,i  +  2Ci323  £23,1 


and  a  similar  relation  for  023,2-  We  look  for  solutions  that  have  discontinuities  in  £a3,p 

along  a  line  F  defined  by  its  local  normal  n  (ni,n2,0).  The  tangent  vector  to  F  at  the  current 
point  is  s  (si,S2)=(-n2,ni,0). 


The  governing  equations  along  F  (equilibrium,  compatibility,  directional  derivatives) 
can  be  cast  in  a  matrix  form: 


"  Ci313 

Ci323 

C2313 

C2323  " 

£13,1 

“  0  " 

0 

1 

-1 

0 

£13,2 

0 

Si 

S2 

0 

0 

£23,1 

El3,s 

-  0 

0 

Si 

S2  -* 

-£23,2- 

-£23.s- 

This  relation  is  of  the  form  A  =  c ;  in  order  for  not  to  be  unique,  we  must  require 


det  A  =  0  (2.9) 

which  yields  in  this  case: 

2  2 

-Si  C2323  +  S1S2  (Ci323  +  C2313)  -  S2C1313  =  0  (2.10) 

or  in  terms  of  the  normal  vector  n  : 

2  2 

02  C2323  +  0102  (Ci323  +  C2313)  +  01  Ci3i3  =0  (2. 1 1) 

The  above  can  be  written 

det  (nj  Qjki  ni )  =  det  (n  C  n  )  =  0  (2.12) 


which  is  the  classical  localization  condition  [5,10].  The  loss  of  uniqueness  corresponds  to 
the  loss  of  ellipticity  of  the  governing  equations,  or  in  other  words,  to  the  appearance  of  real 
characteristics  which  are  associated  with  equations  of  a  hyperbolic  type. 

We  focus  now  on  the  dynamic  case.  The  equation  of  motion  for  the  antiplane 
problem  is: 


<^13,1  +  023,2  =  p  V3,t 
The  cross-derivative  relations: 


(2.13) 


440 


ei3.t  =  V3,i 


£23.1  =  V3.2 


(2.14) 


are  combined  with  the  equation  of  motion  to  yield  a  system  of  first  order  PDE  s; 


0  Ci313  Ci323T‘ 

-10  0  I 
0  0 


V3"j  r  0  C2313  C2323 
£13  4  0  0  0 

0  JLe23J,i  L-1  0  0 


(2.15) 


This  system  is  of  the  form: 

AlU.i  4-A2U.2  +AtU,t  =0  (2.16) 

where  AUs  nonsingular.  The  condition  for  d)(xi,X2,t)  =  0  to  be  a  characteristic  surface  of 
(2.15)  is  [11]: 

det(A)  =  0  (2.17a) 

where 

A  =  Al<D,i  4-A2<D^  +AtO,t  '-17b) 

which  yields  here: 


2  2  2 
0,t(  -pO.t  +  (Ci323  +  C2313)  +  €1313*^,1  +  C2323^,2  )  =  0 


(2.18) 


The  extra  factor  0,t  in  (2. 18)  corresponds  to  a  characteristic  surface  with  zero  velocity  and  is 
a  result  of  introducing  an  additional  dependent  variable  by  choosing  strains  and  velocity  as 
the  dependent  variables  [12].  In  order  to  better  understand  the  meaning  of  (2.18),  we  define 
the  constitutive  matrix  D  such  that: 

Dkl  =  Ck3l3  (2-19) 


and  select  a  new  coordinate  system  (1,2)  defined  by  the  principal  directions  of  D,  so  that  in 
the  new  coordinate  system: 


D  = 


■Di  0  - 

-  0  D2- 


(2.20) 


Equation  (2.18)  can  thus  be  written  as: 

-poj  +DiO'i  +  D2<J?l  =0  (2.21) 

Characteristic  surfaces  are  cones  of  elliptic  section,  as  illustrated  in  figure  1;  the  equation  of 
the  cone  passing  through  a  point  (Xq,  t©)  is: 
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(2.22) 


where 


(t-to)^=  ^(x-Xo)^+  ^(y-yo)^ 
Cl  C2 


ci  = 


and 


(2.23) 


As  D  loses  positive  definiteness,  say  for  example  Di  remains  strictly  positive  and  D2 
approaches  zero,  tiie  cone  collapses  to  a  plane  surface.  Considered  as  a  function  of  the 

variables  ( y ,  t ),  the  system  loses  strict  hyperbolicity,  or  equivalently  real  waves  no  longer 

propagate  in  every  direction  (in  our  case  they  stop  propagating  in  the  y  direction).  It  is 
therefore  seen  that  here  the  condition  of  strict  hyperbolicity  of  the  system  of  governing 
equations  and  the  condition  of  strong  ellipticity  are  equivalent 


It  should  be  pointed  out  that  when  a  viscoplastic  constitutive  law  is  used,  the 
equations  of  motion  do  not  lose  hyperbolicity.  This  is  readily  seen  by  observing  that  for 
viscoplastic  models  the  rate  constitutive  relation  is  written  as: 


=  Cijkl  Ejci  -  Rij(o) 


(2.24) 


where  Cyki  is  the  elastic  tensor  and  the  inelastic  part  is  embedded  in  the  term  Ry.  The  type 

of  the  system  of  governing  equations  is  determined  by  Cijki ,  so  that  it  remains  strictly 
hyperbolic,  the  inelastic  effects  appearing  only  as  a  forcing  term. 

2.2  Relation  between  strain  softening  and  localization  for  elasto-plastic  materials 


as: 


The  rate  constitutive  relations  for  an  elasto-plastic  material  arc  written  in  tensor  form 


o  =  C:c  =  C®:€ 


C^:P 

h  +  Q  :  C  e:  P 


Q:Ce;c 


(2.25) 


where  P  and  Q  are  symmetric  first  order  tensors  giving  respectively  the  direction  of  the 
plastic  deformation  and  the  outer  normal  to  the  yield  surface,  h  is  the  rate  of  hardening,  and 
C  ®  the  elasticity  tensor: 


Cijki  —  ^  5ij  6kl  +  G  (  5ik5ji  +  5ii  5jk  ) 


(2.26) 


and  X  and  G  are  Lame's  constants.  The  case  P  =  Q  corresponds  to  plastic  normality,  P  Q 
corresponds  to  a  non-associative  flow  rule. 

We  consider  again  the  antiplane  shear  problem  and  focus  on  the  relation  between  the 
localiption  condition  and  the  strain-hardening  modulus  in  the  case  of  an  elasto-plastic 
material  model.  For  localization  to  occur  on  a  plane  of  normal  n,  the  condition  (2. 12)  has  to 
be  met.  We  can  write  that  condition  in  a  set  of  cartesian  axes  n  ,  x  n  and  is  the 
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unit  vector  in  the  ’'3"  direction  ).  With  subscripts  denoting  components  in  that  set  of  axes, 
and  for  the  constitutive  law  (2.25)-(2.26),  the  localization  condition  reduces  to: 


Ci3l3-  2(G  -  h  +  4  Pi3^2G 
or  equivalently 

^=-2(2Pi3Qi3)  (2.28) 

This  expression  shows  that  if  plastic  normality  holds  (i.e.  P  =  Q ),  then  localization  can  only 
occur  with  negative  h,  that  is  in  a  strain-softening  regime,  whereas  if  normality  does  not 
apply,  it  is  possible  for  localization  to  be  triggered  with  a  positive  h. 

This  result,  obtained  for  the  particular  case  of  the  antiplane  shear  problem,  is  in  fact 
general  as  shown  by  Rudnicki  and  Rice  [13]  and  Rice  [5].  In  the  three-dimensional  case, 

(2.28)  can  be  generalized  to  (with  a,  P  denoting  components  on  cartesian  axes  in  the  plane 
of  localization): 

^  -  2  Pep  Qap  -  ^  Paa  Qpp  (2.29) 

and  the  conclusions  derived  previously  remain  valid.  An  example  of  a  material  model  where 
localization  occurs  for  positive  h  can  be  found  in  [13]. 


3.  LOCALIZATION  LIMITERS 

Localization  limiters  can  be  classified  as  follows: 

1.  nonlocal  or  integral  limiters  where  the  strain  measure  includes  an  integral  of  the 
deformation  over  a  finite  domain  [3]. 

2.  differential  limiters  where  the  strain  or  stress  measures  include  derivatives  of  order 
higher  than  one  [4,14-17]. 

3.  rate  limiters,  where  a  time  dependence  is  built  into  the  equations[18]. 

The  rationale  underlying  the  nonlocal  limiters  is  that  a  classical  local  theory  does  not 
take  into  account  the  influence  of  the  length  scale  associated  with  a  rapidly  varying  strain 
field  on  the  stress  distribution,  an  essential  part  of  the  localization  phenomena. 

In  the  case  of  a  one-dimensional  rod  with  strain-softening,  a  nonlocal  limiter  is  obtained  by 
defining  the  stress  field  o(x)  as  a  function  of  a  nonlocal  strain  e(x)  [3]: 

a(x)  =  a(^x))  (3.1) 

with 

1  f^/2 

£(x)  =  ^  e(x+s)w(s)  ds  (3.2) 
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The  gradient-type  limiter  in  a  one  dimensional  context  is  given  by  [4]: 

^(x)=  e  (x)  +  a  e  xx(x)  (3.3) 

These  two  limiters  are  related  through  a  Taylor  expansion  [4]  and  actually  differ  by  a 
function  of  order  o  (4.2),  provided  that: 


V- 

a  =  (3.4) 

In  dynamic  problems,  the  effect  of  the  differential  and  rate  limiters  from  a 
mathematical  point  of  view  is  that  the  governing  equations  no  longer  become  elliptic  with  the 
onset  of  strain-softening.  This  can  be  seen  in  a  one-dimensional  context  for  a  path 
independent  material,  by  combining  the  equation  of  motion  and  the  compatibility  condition 
into  a  system  of  first  order  partial  differential  equations: 


where  v,  a(e),  e  and  p  are  respectively  the  particle  velocity,  stress,  strain  and  density,  and 
subscript  comma  denotes  a  partial  derivative.  This  system  is  of  the  type 

AUt  +BUx  =  c(U)  (3.6) 

where  one  of  the  matrices  A  or  B,  e.g.  A,  is  nonsingular  and  c  is  a  forcing  vector.  The  nature 
of  (3.6)  is  determined  by  the  roots  of  the  characteristic  determinant,  det  ( B  -  XA)  (or  det  (  A 
-XB)if  A  is  singular  ).In  the  case  of  Eq.  (3.5) 

det  (  B  -  X  A  )  =  det  ( B  -  X  I )  =  X2  (3.7) 

P 

so  that  the  system  becomes  elliptic  when  a'(e)<0  (which  corresponds  to  strain-softening) 
because  the  determinant  (3.7)  does  not  possess  real  roots  anymore. 

When  the  differential  limiter  defined  in  (3.3)  is  included  in  the  formulation,  a 
modified  system  of  P.D.E's  is  obtained: 
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0  0  0  0  ”! 

10  0  0 
0  0  0  0 

0  0  0  -1 


e 

wi 
W2 

.  V 


.t 


1 

0 

0 


0 

0 

1 


0  0 

0  -1 

0  0 


q'(e)  Q  aq'(e)  q 

up  p 


e 

wi 
W2 
U  V  — J.x 


WI 
0 

W2 

-  0  -« 


(3.8) 


where  wi  =  and  W2  =  e,xx  •  The  characteristic  determinant 


det  ( A  -  X  B  )  =  (3.9) 

P 

possesses  four  real  roots,  all  equal,  irrespective  of  the  sign  of  a'(e),  so  that  the  system  is 
parabolic.  It  should  be  pointed  out  that,  when  a  rate-type  limiter  is  used  via  a  viscoplastic 
material  model,  the  governing  equations  remain  hyperbolic  [18]. 

In  order  to  better  understand  the  behavior  of  the  integral  and  differential  limiters,  a 
Fourier  analysis  by  the  method  of  frozen  coefficients  is  useful.  In  this  analysis  a 

displacement  disturbance  5u  is  applied  to  the  body,  and  the  material  is  considered  to  be  in  a 
strain-softening  state  over  an  interval  [xi,X2]: 

5c(x)  =  -!£[!  5e(x)  for  x  in  [xi,X2]  (3.10) 

where  the  tangent  modulus  Et  <  0  is  assumed  constant.  We  then  look  for  possible  wave 
solutions  of  the  form 


5u(x,t)=A 


(3.11) 


for  the  equation  of  motion  for  5u: 


nonloc. 


(x),x  =  0 


(3.12) 


The  following  dispersion  relation  was  obtained  in  [4]  with  the  gradient-type  limiter  ^  defined 
by  Eqn.  (3.3): 


IE  I 

kv  =  i{  — (l-ak2)}^^  k  =ite  (3.13) 

P 

A  similar  analysis  can  be  done  for  the  integral  limiter  e"°”^°^'(x)  =  e(x)  defined  in  Eqn.  (3.2). 
Looking  for  wave  solutions  of  the  equation  of  motion 
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5u  tt  “  ^(|^5e(x+s)dsXx  =0  (3.14) 

one  obtains  the  following  relation: 

.  .  r  EtI  2k  .  .  1/2  ,  .  -  ,  ^  ^ 

kv  =  i( - psin-a-j  k  =iY(k)  (3.15) 

pi  ^ 

The  two  functions  ^(x)  and  y  (k)  are  plotted  in  fig.  2,  for  values  of  a  and  i  related  through 

eqn.  (3.4).  In  [4]  the  plot  of  y  (k)  was  interpreted  to  mean  that  the  growth  in  short 
wavelength  inputs  susceptible  to  develop  in  the  narrow  localization  zones  is  bounded  when 
the  limiter  is  present. 


It  is  interesting  to  notice  that  for  small  values  of  a,  the  expression  of  y  (k)  can  be 
expanded,  and  using  (3.4): 


— --p=Lk  '^(k) 

P  'N/tia 


(3.16) 


A  perturbation  analysis  [4]  reveals  that,  when  the  differential  limiter  defined  in  (3.3) 
is  used,  the  width  of  the  zone  in  the  strain-softening  regime  varies  with  the  square  root  of  the 

parameter  ot.  Numerical  simulations  confirmed  this  type  of  dependence,  and  for  the  integral 

limiter  they  yield  a  zone  size  proportional  to  the  averaging  length  i  [3]  as  can  be  expected 

from  relation  (3.4).  Thus  both  localization  limiters  prevent  the  growth  of  waves  of  the  scale 
of  the  localization  bands  which  are  generated  by  the  presence  of  strain- softening. 

As  far  as  static  problems  are  concerned,  the  only  attempts  to  derive  closed  form 
solutions  using  limiters  known  to  the  author  are  due  to  Aifantis  and  co-workers  [16,17] 
Coleman  and  Hodgdon  [15]  and  Schreyer  and  Chen  [14].  In  the  former  approach,  higher 
order  terms  are  included  in  the  evolution  equation  of  the  flow  stress,  and  in  that  sense  it  is 
quite  similar  to  the  work  of  Schreyer  and  (Then.  In  Coleman  and  Hodgdon  [15],  a  second 
order  strain  gradient  is  added  directly  into  the  constitutive  equation  without  modifying  the 
yield  function.  The  common  denominator  to  all  of  these  approaches  is  that  they  make  the 
stress  field  dependent  in  some  way  on  the  spatial  derivatives  of  the  strain  field.  We  follow 
the  approach  of  Coleman  and  Hodgdon  [15],  but  do  not  limit  our  formulation  to  rigid  plastic 
materials.  The  expression  for  the  stress  is  given  by: 

a  =({)(£)- a  V^e  (3.17) 

where  (|)(e)  is  the  usual  elastoplastic  constitutive  law(stress-strain  relationship)  and  a  >  0  is  a 
coefficient  having  the  dimensions  of  a  force.  The  stiffness  matrix  corresponding  to  the  finite 
element  formulation  of  (3.17)  is  developed  in  [21]. 
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4.  NUMERICAL  EXAMPLES 

In  this  section  numerical  solutions  are  presented  with  and  without  limiters  for  simple 
problems.  Dynamic  situations,  where  no  stiffness  matrix  has  to  be  constructed  if  explicit  time 
integration  is  used  are  first  considered;  static  problems  are  considered  in  subsection  4.3. 

4.1  Wave  propagation  in  a  rod 

This  problem  was  considered  in  [3],  see  fig.  3a.  Equal  and  opposite  velocities  vq  are 
applied  to  the  two  ends  of  a  rod  of  length  2L  made  of  a  strain-softening  material,  so  that 
tensile  waves  are  generated  at  the  ends.  The  magnitude  of  the  strain  is  slightly  less  than  the 
strain  corresponding  to  the  onset  of  strain-softening.  These  tensile  waves  propagate 
elastically  to  the  center,  when  they  meet  at  the  center,  the  stress  would  double  if  the  behavior 
remained  elastic,  so  that  strain-softening  starts  at  this  midpoint. 

The  analytical  solution  for  this  problem  was  proposed  in  [1]:  localization  occurs  at  the 
midpoint  where  the  strain  becomes  infinite.  The  solution,  symmetric  about  the  midpoint  x=L, 
is  expressed  for  the  left  half  as: 

e  =  ^  [  H(t  -  ^)  -  H(t  -  +  4  <cot  -  L>  5(x-L)]  (4.1) 

where  H  is  the  Heaviside  step  function,  <A>=A  if  A  >0,  A=0  otherwise,  5  is  the  Dirac- 
delta  function,  Cq  the  elastic  wave  speed  in  the  material.  Numerical  studies  of  this  problem 
based  on  nonlocal  approaches  were  conducted  in  [3].  Here  we  will  use  the  localization  limiter 

e  =  £  +  a  e^xx  (^-2) 

The  development  of  a  finite  element  formulation  corresponding  to  (4.2)  can  be  found  in  [4]. 
Particular  provisions  are  made  to  avoid  zero  energy  modes,  and  stiffness  proportional 
damping  is  added  in  order  to  prevent  oscillations  ahead  of  the  wave  front,  see[4]. 

The  stress-strain  law  considered  for  the  calculations  is  illustrated  in  the  enclosed  box 
in  fig.  4.  Other  parameters  used  in  the  calculations  were:  density  p=l.,  end  velocity  vo=0.6, 
and  for  the  stress-strain  relation,  E  =  1.,  yield  stress  ap=  1.,  Ej=  -.25,  Ef  =  5.,  nearly 
horizontal  tail  of  slope  Ef  =.001  beyond  Ef . 

It  was  first  checked  (see  fig.  4)  that,  without  introducing  the  localization  limiter  (that 

is,  for  a=0),  the  strain  profiles  are  severely  dependent  on  the  mesh  refinement,  and  the 
localization  zone  shrinks  to  one  element,  irrespective  of  its  size.  Furthermore,  the  total 
energy  dissipated  in  the  mesh  tends  to  a  zero  value  as  the  mesh  is  refined,  as  seen  in  fig.  6a. 
Convergence  studies  were  then  performed  with  the  localization  limiter  defined  in  (4.2),  for  a 

value  a  =  .1667  ,  for  different  meshes  with  increasing  number  of  elements  (fig.  5).  They 
exhibit  a  localization  limited  to  a  finite  size  zone,  the  length  of  that  zone  and  the  strain  profiles 
being  independent  of  the  mesh  refinement.  Moreover,  the  total  energy  dissipated  in  the  rod  is 
independent  of  the  mesh  size,  all  other  parameters  remaining  equal,  as  illustrated  in  fig.  6a. 
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Calculations  were  also  conducted  at  fixed  mesh  size  for  different  values  of  a,  see 
fig.  6b.  showing  that  the  length  of  the  localization  zone  is  linearly  dependent  on'Ja. .  This  is 
consistent  with  the  results  of  [3],  that  found  a  linear  dependence  in  i  (averaging  ’ength  ), 
since  a  Taylor  expansion  yielded  the  linear  relation  (3.4)  between  a  and  V-. 

4.2  Spherically  symmetric  problem 

This  problem  (see  fig.  3b)  was  considered  with  strain-softening  materials  in  [20].  A 
sphere  made  of  a  strain-softening  material  is  loaded  with  a  uniform  traction  on  its  exterior 
surface.  To  better  appreciate  the  complexity  of  this  problem,  consider  the  load  to  be  a  ramp 
function  in  time.  Before  the  onset  of  strain-softening  at  an  interior  surface  S,  a  portion  of  the 
stress  will  have  passed  through  S.  Due  to  the  spherical  geometry,  the  stresses  in  this  wave 
are  amplified  as  they  pass  tr  the  center  and  trigger  the  formation  of  additional  strain-softening 
surfaces.  As  conjectured  in  [20],  it  seems  that  an  infinite  number  of  localization  surfaces  will 
appear,  although  no  analytical  solution  has  been  proposed  so  far. 

The  localization  limiter  defined  previously  in  (4.2)  was  used  to  solve  numerically  this 
problem.  We  considered  a  sudden  application  of  a  uniform  normal  traction  Oj  =  poH(t)  at  the 
exterior  surface  R2=100;  the  interior  surface  is  R^=10.  The  applied  surface  pressure  was 
chosen  as  Po=.708;  for  this  boundary  conditions,  the  wave  propagating  from  the  outer 
surface  remains  elastic  until  the  wavefront  reaches  O.7R2.  The  same  material  constants  as  in 
section  4.1  were  considered. 

It  was  first  noticed  (see  fig.  7)  that  without  the  localization  limiter  (that  is  for  a  =  0), 
as  the  number  of  elements  is  increased,  several  points  of  localization  develop,  and  these 
points  change  arbitrarily  with  mesh  refinement,  even  in  the  presence  of  damping.  These 
points  of  localization  can  be  appreciated  both  in  the  volumetric  strain  plots,  with  the  presence 
of  spikes,  and  in  the  radial  displacements  plots,  where  sharp  discontinuities  indicate 
separation  along  a  surface. 

The  next  group  of  solutions  ( figs.  8  )  examines  the  effect  of  the  localization  limiter. 
These  solutions  converge  well  with  mesh  refinement,  and  furthermore,  they  are  very  similar 
to  those  found  with  the  imbricate  elements  approach  [20]. 

4.3  Static  problems 

When  conducting  the  numerical  simulations  for  static  problems,  where  a  stiffness 
matrix  has  to  be  developped,  it  was  noticed  that  the  introduction  of  the  localization  limiter  did 
not  remove  completely  all  the  unpleasant  features  present  in  calculations  involving  strain¬ 
softening  materials.  More  precisely,  when  the  strain-softening  regime  is  incipient,  the 
Newton-Raphson  procedure  often  results  in  iterations  that  oscillate  between  two  or  more 
states  and  fail  to  converge  to  one  equilibrium  state.  From  a  numerical  point  of  view,  this  is 
lilted  to  the  tangent  stiffness  Kt  does  not  remain  positive  definite.  In  [19],  a  remedy  for 
this  difficulty  was  proposed;  it  consists  of  posing  the  problem  as  the  minimization  of  the 
length  of  the  residud  vector: 

Minimize:  F  =  r^(d)  r  (d) 


(4.3) 


where 


(4.4) 


r(d)  =  f 


ext 


and  require 


0  at  the  minimum. 


This  provides  a  more  well-behaved  problem  for  the  line-search  procedure  and  the  rate 
of  convergence  of  the  Newton  method  is  improved  substantially.  The  method  was  also 
adapted  in  [19]  so  as  to  combine  it  with  arc-len^h  procedures. 

To  test  the  effectiveness  of  the  localization  limiter  and  the  solution  strategy,  we 
consider  the  problem  of  a  one-dimensional  rod,  subjected  to  equal  and  opposite  loading  at  its 
two  ends,  as  illustrated  in  figure  9.  One  node  in  the  mesh  is  held  fixed,  so  as  to  prevent  rigid 
body  translations.To  trigger  the  appearance  of  a  non-homogeneous  strain-distribution,a  small 
imperfection  is  introduced.  In  the  present  example,  this  was  accomplished  by  making  the 
cross-section  of  the  center  element  1%  smaller  than  the  cross-section  of  all  other  elements. 

Numerical  studies  of  this  problem  were  conducted  based  on  the  localization  limiter 
defined  pnsviously  in  Eqn.  (3.17),  which  in  one-dimension  reduces  to: 


CT  =  ([)(£)  -  a  e  XX  (4-5) 

The  elasto-plastic  strain-stress  law  considered  is  also  illustrated  in  fig.  9,  It  consists  of  a 
linear  elastic  part,  and  an  exponential  branch  including  a  strain  hardening  portion  followed  by 
a  softening  one.  At  any  point  the  unloading  is  elastic  with  Young's  modulus  E.  The  physic^ 

parameters  used  for  the  calculation  were:  Young's  modulus  E=200,  yield  strain  ei  =  .05,  Em 

=  0.3,  exponential  branch:  ([)(£)  =  E  £i  where  g(£)  =  (l- 

(£i+5o)  em+5o 

parameter  controlling  the  convexity  6o  =  0. 1 1. 

To  solve  this  problem,  the  line-search  technique  combined  with  the  arc-length  method 
with  a  linearized  constraint  equation  described  in  [19]  was  used.  It  was  first  checked  that 

without  introducing  the  localization  limiter,  that  is  for  a=0,  the  deformation  localizes  in  the 
element  with  imperfection,  irrespective  of  its  size,  while  all  other  elements  unload  elastically. 
In  a  load-displacement  curve,  a  sharp  decrease  is  observed  once  strain-softening  is  attained, 
and  even  a  snap-back  behavior  can  be  observed,  which  could  not  be  captured  with  a  pure 
displacement  control  strategy. 

Calculations  were  then  conducted  with  the  localization  limiter  defined  in  Eqn.  (4.5), 
for  several  values  of  the  parameter  a.  The  strain  distribution  along  the  rod  for  various  load 
levels  is  given  in  fig.  10a.  These  strain-profiles  are  very  close  in  shape  to  the  ones  obtained 
by  Coleman  and  Hodgdon[15]  in  their  study  of  the  effects  of  the  localization  limiter  (4.5)  on 
strain-localization  for  a  rigid  plastic  material  with  a  parabolic  law.  Essentially,  a  finite 
localization  zone  emerges,  practically  constant  in  size,  where  the  strain  increases  but  remains 
bounded,  while  in  the  rest  of  the  rod,  the  material  unloads  elastically.  In  the  finite  element 
calculation,  that  localized  zone  spans  a  few  elements  of  the  mesh.  The  size  of  the  zone  is 

directly  related  to  the  value  of  a. 
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Load-displacement  curves  for  various  values  of  a  are  reported  in  fig.  10b.  They 

exhibit  a  milder  negative  slope  with  increasing  a.  It  should  be  pointed  out  that,  without  the 
use  of  the  line  search  procedure  described  above,  the  Newton- Raphson  procedure  fails  to 
converge  near  the  critical  point. 


5.  CONCLUSIONS 

Localization  limiters  can  be  classified  as  nonlocal,  differential  and  rate  limiters.  A 
Fourier  analysis  of  the  wave-propagation  problem  shows  that  the  introduction  of  nonlocal  or 
differential  limiters  leads  to  governing  equations  where  short  waves,  which  are  likely  to 
develop  with  the  onset  of  strain-softening,  have  a  bounded  growth.  In  dynamic  problems, 
strain-softening  causes  the  governing  equations  to  lose  strict  hyperbolicity;  it  was  shown  for 
example  that  they  become  elliptic  in  at  least  one  direction  for  the  antiplane  problem.  With  the 
gradient-type  localization  limiter,  the  dynamic  equations  change  from  hyperbolic  to  parabolic 
for  the  one-dimensional  case.  The  character  of  the  amplification  spectrum  of  the  integral  and 
differential  limiters  is  similar  and  they  become  identical  in  the  limit  as  the  magnitude  of  the 
parameter  governing  the  limiter  goes  to  zero. 

The  differential  localization  limiter  proposed  by  Coleman  and  Hodgdon  [15]  based  on 
the  introduction  of  the  second  derivative  of  the  strain  in  the  stress  expression  was 
implemented  in  the  context  of  static  problems.  Numerical  studies  showed  that  it  allows  for 
the  development  of  a  localized  strain  zone  spanning  over  several  elements  of  the  mesh. 
However,  the  addition  of  the  limiter  does  not  guarantee  positive  definiteness  of  the  tangent 
matrix. 
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Figure  2.  Linearized  analysis:  ^(x)  (gradient-type  limiter)  and  y  (k)  (integral-type  limiter) 
versus  the  wavelength  k. 
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Fig.  3.  Problem  descriptions,  a)  ID-rod  problem,  2L=40;  b)  Spherically  symmetric  problem, 
interior  radius  Ri=10.,  exterior  radius  R2=100  . 
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Fig.  4.  Rod  problem,  strain  plots  at  time  T=39.0  for  different  meshes,  no  higher  order  term 
limiter  (a=0).  In  enclosed  box,  stress-strain  curve:  E  is  Young’s  modulus;  for  the  spherically 

symmetric  problem,  E  is  replaced  by  K  (bulk  modulus),  Ej  by  Ky,  e  by  ey  (volumetric 
strain). 
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Fig.  6.  Rod  problem:  a)  Energy  dissipated  vs  number  of  elements,  with  and  without 
localization  limiter.  b)Size  5  of  the  localization  zone  as  a  function  of  Vol 
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Fig.  7.  Sphencally  symmetric  problem,  radial  displacement  and  volumetric  strain  plots, 
without  using  the  higher  order  term  limiter  (  a=0. ). 


Fig.  8.  Spherically  symmetric  problem,  radial  displacement  and  volumetric  strain  plots,  with 
limiter  (  a=.169  ),  with  meshes  of  41,  81  and  121  elements. 


Figure  9.  Traction  curve  £-<P(£).  In  enclosed  box,  problem  description:  rod  length  2L=10,  40 
nodal  points. 
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Figure  10.  a)  Strain  vs  spatial  coordinate  profiles  at  four  different  load  steps  with  higher 
order  term  limiter  (a=.5).  b)  Load-displacement  curves  for  different  values  of  a. 
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Abstract 


In  any  explosive  device,  the  chemical  reaction  of  the  explosive  takes  place  in 
a  thin  zone  just  behind  the  shock  front.  The  finite  size  of  the  reaction  zone 
is  responsible  for;  the  pressure  generated  by  the  explosive  being  less  near  the 
boundaries,  for  the  detonation  velocity  being  lower  near  a  boundary  than  away 
from  it,  and  for  the  detonation  velocity  being  lower  for  a  divergent  wave  than  for  a 
plane  wave. 

In  computer  models  that  are  used  for  engineering  design  calculations,  the 
simplest  treatment  of  the  explosive  reaction  zone  is  to  ignore  it  completely.  Most 
explosive  modeling  is  still  done  this  way.  The  neglected  effects  axe  small  when 
the  reaction  zone  is  very  much  smaller  than  the  explosive’s  physical  dimensions. 
When  the  ratio  of  the  explosive’s  detonation  reaction-zone  length  to  a  representative 
system  dimension  is  of  the  order  of  l/lOO,  neglecting  the  rezurtion  zone  is  not 
adequate. 

An  obvious  solution  is  to  model  the  reaction  zone  in  full  detail.  At  present, 
there  is  not  sufficient  computer  power  to  do  so  economically.  Recently  we  have 
developed  an  alternative  to  this  standard  approach.  By  transforming  the  governing 
equations  to  the  proper  intrinsic-coordinate  frame,  we  have  simplified  the  analysis 
of  the  two-dimensional  reaction-zone  problem.  When  the  radius  of  curvature  of  the 
detonation  shock  is  large  compared  to  the  reaction-zone  length,  the  calculation  of 
the  two-dimensional  reaction  zone  can  be  reduced  to  a  sequence  of  one-dimensional 
problems. 


I.  Introduction 

Describing  the  propagation  of  detonation  in  complex  multi-dimensional  explosive  geometries 
is  an  important  and  ongoing  problem  in  the  design  process  for  explosively  driven  devices.  In  order 
for  the  design  of  the  explosive  system  to  be  successful,  two  requirements  need  to  be  met.  First, 
the  detonation  of  the  explosive  system  must  be  robust,  that  is  relatively  insensitive  to  variations 
in  the  initial  conditions,  such  as  changes  in  temperature  and  vzu'iations  in  the  initiation  system. 
At  the  same  time,  the  explosive  system  mtist  be  safe  from  accidental  initiation  of  detonation. 
The  parameter  which  is  the  ratio  of  the  explosive’s  detonation  reaction-zone  length  to  a 
representative  system  dimension,  is  the  parameter  that  controls  these  properties.  When  P  is 
small  (relatively  fast  reaction)  the  system  is  robust,  but  prone  to  accidental  initiation.  When  P  is 
large  the  explosive  is  near  its  failure  limit  making  it  harder  to  set  off  accidentally  but  also  more 
sensitive  to  variations  in  the  initial  condition,  A  value  of  P  of  about  ,01  is  a  good  compromise. 
Problems  of  accidental  initiation  zue  minimized,  yet  at  the  same  time  the  detonation  is  relatively 
insensitive  to  initial  conditions. 

For  most  explosive  geometries,  this  ratio  is  small  enough  so  that  the  integrated  momentum 
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through  the  reaction  ^one  is  small  in  comparison  to  that  in  the  broad  region  where  the  reaction 
products  expand  and  do  work  on  their  surroimdings.  Thus  the  reaction  zone  has  little  direct 
influence  on  the  process  of  driving  inert  materials  that  are  in  contact  with  it.  However,  the 
indirect  influences  of  the  reaction  zone  on  the  calculation  can  be  much  more  important.  When 
^  .01  a  significant  fraction  of  the  explosive  charge  experiences  such  things  as  reduced  detonation 

pressure  and  velocity  near  boimdaries,  as  well  as  a  slower  detonation  velocity  everywhere  for  a 
divergent  detonation  than  for  a  plane  one.  These,  in  turn,  lead  to  large  errors  in  zeroth-order 
effects  such  as  the  time  of  detonation  arrival  and  the  two-dimensional  detonation  wave  shape. 
From  the  point  of  view  of  the  designer,  this  is  a  difficult  computational  regime.  Not  only  does 
he  need  to  resolve  the  broad  region  where  the  reaction  products  expand  and  do  work  on  their 
surroimdings,  but  he  must  also  resolve  the  thin  reaction  zone. 

Because  of  the  disparate  lengths  of  the  reaction  zone  and  the  products  expansion  wave,  most  of 
the  explosive  design  codes  in  use  today  employ  some  variant  of  the  constant-detonation-velocity 
“Huygens”  construction  to  propagate  the  detonation  wave.  This  method  for  propagating  the 
detonation  only  works  well  for  explosives  for  which  the  reaction  zone  can  be  ignored  (i.e.,  P  is  less 
than  1/1000).  Ad  hoc  “fixes”  of  this  simple  model  have  been  tried  to  model  systems  with  larger 
values  of  P.  For  example,  the  detonation  velocity  may  arbitrarily  be  set  to  some  lower  value  near 
the  edge.  These  have  met  with  only  limited  success. 

With  all  of  its  shortcomings,  the  simple  “Huygens”  method  has  one  real  advantage, 
computational  speed.  Since  the  reaction  zone  does  not  need  to  be  modeled,  design  calculations 
are  fast  enough  to  allow  many  design  iterations  to  be  tried.  This  is  an  important  feature  that 
design  codes  need  to  have. 

In  order  to  improve  on  this  simple  method,  the  reaction  zone  must  be  modeled.  This  of 
course  requires  knowledge  of  the  equation  of  state  (eos)  of  the  partially  reacted  explosive  and 
of  the  reaction  rate.  When  explicit  information  is  available,  one  can  in  principle  follow  the 
standard  approach  and  do  multi-dimensional  simulations  that  resolve  both  the  reaction  zone  and 
the  explosive  products  region.  Typically  we  have  only  limited  constitutive  information:  the  shock 
Hugoniot  of  the  “unreacted”  explosive,  an  equation  of  state  of  the  explosive  products,  and  a 
compatible  energy-release  rate  callibrated  to  one-dimensional  experiments. 

To  be  useful,  a  numerical  simulation  of  the  reaction  zone  must  be  able  to  resolve  all  of  the 
important  features  of  the  flow.  Fickett^  has  shown  that  when  the  standard  one-dimensional 
(ID)  Lagrangian-mesh  artificial-viscosity  methods  are  used,  roughly  15  computational  cells  are 
needed  in  the  reaction  zone  to  get  10%  accuracy.  This  translates  into  many  tens  of  thousands 
of  computational  cells  for  a  typical  two-dimensional  (2D)  numerical  calculation  done  with  a 
uniform  grid  method.  Even  with  today’s  supercomputers,  such  calculations  take  many  hours 
of  computation  time;  they  are  not  practical  for  routine  use.  When  one  reduces  the  number  of  cells 
in  the  calculation  in  order  to  get  sensible  computational  times,  the  acctiracy  of  the  calculations 
suffers. 
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In  large  measure,  the  inordinately  large  computation  time  is  a  result  of  the  lack  of 
sophistication  of  the  standard  uniform  grid  methods.  The  mesh  size  that  is  needed  to  achieve 
reasonable  resolution  in  the  reaction  zone  is  excessively  fine  for  the  broad  products  expansion 
region.  Today  researchers  are  developing  a  variety  of  improved  methods  that  include  such  features 
as:  (l)  multi-grid  techniques  that  employ  moving  fine  zoning  near  shocks, ^  (2)  schemes  based  on 
the  method  of  characteristics  such  as  CIR  and  Godunov,^.^  and  (3)  shock-tracking  methods.'*  To 
date,  however,  none  of  these  methods  has  reached  the  point  of  maturity  where  they  could  replace 
the  standard  method  for  routine  detonation  calculations. 

The  central  issue  in  improved  2D  calculations  of  detonation  is  a  high-accuracy  calculation 
of  the  reaction-zone  structure,  plus  a  relatively  coause-grid  calculation  of  the  following  products 
release  wave.  One  way  of  getting  a  high-accuracy  calculation  of  the  reaction-zone  structure  is  to  do 
it  analytically.  This  alternative  brings  with  it  the  direct  computational  benefit  plus  the  advantage 
of  a  theoretical  understanding  of  the  2D  detonation  process.  With  such  an  understanding,  we  could 
maJce  a  fast  high-resolution  wave-tracking  code  that  solves  the  reaction-zone  fiow  analytically  and 
the  broad  products  region  with  a  coarse-grid  numerical  simulation.  This  increased  knowledge  also 
brings  with  it  the  insights  that  lead  to  the  improvements  that  are  necessary  if  some  of  the  more 
sophisticated  computational  methods  mentioned  above  are  to  become  practical  tools. 

An  analytical  solution  of  the  general  2D  time-dependent  detonation  problem  is  not  within 
reach.  However,  in  many  applications  of  explosives,  one  observes  that  the  radius  of  curvature 
of  the  detonation  shock  is  large  in  comparison  with  the  reaction-zone  length.  Recently  we  have 
developed  an  alternative  to  the  standard  numerical  approach  that  is  based  on  the  large  radius  of 
curvature  limit.  By  transforming  the  governing  equations  to  the  proper  intrinsic-coordinate  frame, 
we  have  simplifiad  the  analysis  of  the  2D  reaction-zone  problem,  and  reduced  it  to  a  sequence  of 
one-dimensional  problems.  The  coordinate  frame  of  choice  is  one  in  which  the  spatial  coordinate 
axes  are  everywhere  locally  parallel  and  perpendicular  to  the  shock.  The  governing  equations 
consist  of  a  kinematic  equation  that  describes  the  progress  of  disturbances  moving  alone  the 
shock,  and  equations  for  the  reaction-zone  dynamics  that  describe  the  quasi-steady  fiow  normal 
to  the  shock  (i.e.,  through  the  reaction  zone).  We  call  this  method  DETONATION  SHOCK 
DYNAMICS  (DSD). 

This  paper  gives  a  brief  review  of  DSD.  We  have  divided  it  into  four  sections.  In  Section  II, 
we  give  an  overview  of  the  theoretical  model.  This  section  is  divided  into  three  subsections.  In 
Shock  Kinematics,  we  briefly  describe  our  coordinate  system  and  the  kinematics  of  the  detonation 
shock.  The  subsection  entitled  Boundary  Conditions  is  devoted  to  a  discussion  of  the  boundary 
conditions  that  are  applied  at  the  edges  of  the  explosive.  In  Reaction-Zone  Dynamics,  the  Euler 
equations  are  transformed  to  the  intrinsic-coordinate  frame,  and  the  analysis  that  leads  to  the 
quasi-steady  description  is  briefly  reviewed.  In  Section  III,  we  demonstrate  how  our  theory  can  be 
used  to  study  a  representative  explosive  design  problem.  In  Section  IV,  we  summarize  our  results. 
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n.  Overview  of  the  Theory 


The  thrust  behind  our  theory  is  the  concept  that  the  response  of  the  detonation  shock  is 
local,  and  is  governed  by  its  current  local  configuration.  Philosophically,  it  is  an  extension  of 
Whitham’s  geometrical  shock  dynamics  to  detonation.®  Our  theory  is  a  uniform  perturbation 
theory,  which  is  based  on  the  notion  that  the  radius  of  curvature  of  the  shock  is  large  compared 
to  the  reaction-zone  length.  It  is  a  nonlinear  theory  that  can  be  used  to  describe  arbitrarily 
large  departures  of  the  detonation  shock  shape  from  the  plane  one-dimensional  state.  From  the 
results  of  our  theoretical  calculations,  the  following  picture  has  emerged.  In  many  situations,  the 
dynamics  of  the  detonation  reaction  zone  is  decoupled  from  the  evolution  of  the  large  following 
reaction  products  expansion  wave,  and  is  controlled  by  the  flow  near  the  shock.  As  a  result,  we  find 
that  the  important  waves  in  the  reaction  zone,  either  rarefactions  or  compressions,  are  transverse 
waves.  Our  theory  describes  how  waves  on  the  shock  are  generated  (e.g.,  near  an  explosive  edge) 
and  move  along  the  shock  (see  Figure  l). 

There  are  three  components  to  the  theory:  (l)  a  kinematic  condition  for  the  shock  surface, 
(2)  conditions  to  be  satisfied  at  the  boundaries  of  the  explosive,  and  (3)  the  fiow  dynamics  in  the 
direction  normal  to  the  shock.  We  will  briefly  describe  each  of  these. 


Figure  1.  A  schematic  diagram  that  shows  how  chemical/mechanical  energy  are 
transported  laterally  through  the  reaction  zone.  The  kinematic  condition  is  applied 
along  (l),  boundary  conditions  are  applied  at  (2)  and  the  reaction-zone  dynamics 
describes  the  flow  along  (3).  To  leading  order,  the  reaction  zone  is  insulated  from 
reirefactions  from  the  re2u-. 


Our  theory  is  based  on  the  time-dependent,  two-dimensional,  reactiv''  Euler  equations.  As  a 
consequence,  the  detonation  shock  (shock)  is  a  surface  of  discontinuity.  Since  we  wish  to  treat 
detonation-wave  evolution  in  complicated  two-dimensional  geometries,  we  have  developed  our 
theory  in  a  problem-determined  intrinsic-coordinate  system  (see  Figure  2).  It  is  a  shock-centered 
frame  that  moves  with  the  local  normal  detonation-shock  velocity  (Dn)-  The  space  variables  are 


n 
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Figure  2.  The  intrinsic-coordinate  system  that  was  used  in  the  calculation.  The 
shock  curvature  is  «;  =  and  =  z^-rj  cos  <j>  ,r^  =  r^-Tj  sin  . 

the  distances  ^  and  rj  locally  parallel  and  perpendicular  to  the  shock, 

a.  Shock  Kinematics 

The  principal  object  of  the  theory  is  to  calculate  the  shock  shape  as  a  function  of  time.  The 
intrinsic  representation  of  a  curve,  such  as  the  shock,  is  in  terms  of  its  curvature  («)  as  a  function 
of  arc  length  along  the  shock  (^)  auid  time  (t).  In  this  coordinate  system,  the  shock  shape  is 
described  by  the  shock  angle  (0)  as  a  function  of  ^  amd  f.  In  ternas  of  these  variables,  the  shock 
curvature  is  /c  =  where  the  indicates  a  partial  derivative  with  respect  to  arc  length.  The 
laboratory  coordinates  for  the  shock  are  returned  by 

4  =  4- sin(0)d^  ,ri  =  r^  +  cos{<i>)d^  ,  (1) 

Jo  Jo 

where  z^  and  r|  are  the  coordinates  of  the  edge.  Typically  we  are  most  interested  in  describing 
the  changes  in  the  shock  shape  that  are  the  result  of  the  interaction  that  occurs  between  the  shock 
and  an  explosive  edge.  For  such  problems,  having  the  zero  of  arc  length  coincide  with  the  edge  is 
the  most  convenient  origin  to  use  for  Figure  3  shows  a  schematic  representation  of  the  shock 
including  the  independent  variable  (^)  and  the  definition  of  the  dependent  variables  Dn  and  4>. 
The  cartesian  unit  vectors  are  and  Cr. 

The  geometric  compatibility  conditions  for  a  moving  two-dimensional  surface  are  given  in 
Whitham® 


^,a  —  ^n,0 


(2) 


and 
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(3) 


The  variable  a  is  equivalent  to  time,  and  labels  a  particular  shock  surface.  The  constant-/?  rays 
are  orthogonal  to  the  shock  and  are  its  propagators.  The  streamtube  area  is  A,  where  at  fixed  a 


df  =  Ad0 

(i.e.,  the  shock  area  between  two  adjacent  constant-/?  rays  or  streamlines). 


(4) 


Figure  S.  Intrinsic  coordinates  and  shock  kinematics.  The  independent  variables 
are  arc  length  (^)  and  time  [t),  while  the  dependent  variables  are  the  normal  shock 
velocity  {Dn)  ^d  the  shock  normal  angle  (0).  The  curves  /?  =  constant  are  normal 

to  the  shock,  and  is  the  angle  between  the  tangent  to  the  edge  and  normal  to  the 
shock. 


For  the  problems  of  interest  in  condensed-phase  detonation,  the  shock  is  seldom  normal  to  the 
explosive  boundary.  As  a  result,  the  coordinate  /?  is  not  a  convenient  independent  variable  since 
boundary  conditions  must  be  applied  at  the  edge.  Changing  independent  variables  from  (a,  /?)  to 
(t,  ^),  we  have 


=  Ad0  +  Bda 


(5) 


and 


dt  =  da  ,  (6) 

where  the  coefficient  B  describes  the  change  in  arc  length  with  time  along  a  constant  0  ray.  Under 
this  transformation,  the  surface  kinematics  [i.e.,  Eq.  (2)|  takes  the  form  of  a  one-dimensional 
Burgers  equation  along  the  shock:  B  is  the  wave  velocity  amd  «  is  the  transport  term 


464 


(7) 


The  coefficient  B  is  obtained  by  requiring  that  the  transformation  [Eqs.  (5)  and  (6)]  be  solvable, 
from  which  it  follows  that 

A,a  =  .  (8) 

From  Eqs.  (3)  and  (8)  it  follows  that 

B=  4>,iDndi^Bo{t)  .  (9) 

Jo 

The  function  Bo{t)  is  the  rate  at  which  shock  arc  length  crosses  the  0  =  constant  ray  that 
intercepts  the  edge.  It  is  given  by 

•Bo(0  —  Dfie  tan(0e)  •  (iO) 

This  intrinsic  form  of  the  shock-surface  kinematics  is  fundamental  to  any  shock-tracking 
method  that  seeks  to  describe  the  evolution  of  shocks  of  arbitrary  shape  in  a  uniform  manner. 
Clearly,  Eqs.  (7)  and  (9)  simply  yield  a  constraint  between  Dn  and  K  =  However,  if  a  second 
algebraic  relation  between  Dn  and  k  can  be  obtained,  then  this  contraint  can  be  converted  into  a 
one-dimensional  partial-differential  equation  for  the  shock  surface.  Further,  if  we  then  prescribed 
the  initial  shape  (<^)  of  the  surface,  as  well  as  some  boundary  condition  at  the  intersection  of  the 
shock  and  the  explosive  boundary,  then  Eq.  (7)  could  be  solved  to  get  the  2D  shock  locus  at  any 
subsequent  time. 

b.  Boundaru  Conditions 

For  the  problems  we  consider  here,  we  do  not  need  to  study  the  complex  flow  or  the  detailed 
boundary  conditions  that  apply  in  the  vicinity  of  the  explosive  boundary.  It  will  be  sufficient  to 
consider  only  the  condition,  if  any,  that  must  be  applied  at  the  locus  generated  by  the  intersection 
of  the  shock  and  the  edge.  We  consider  only  an  explosive/vacuum  interface. 

At  such  an  interface,  the  flow  experiences  a  singularity.  In  the  explosive,  the  pressure  just 
behind  the  detonation  shock  is  near  the  Chapman-Jouguet  (cj)  pressure;  just  outside  the  explosive, 
the  pressure  is  at  or  near  zero.  In  order  for  the  flow  to  execute  such  a  transition,  a  singularity 
of  Prandtl-Meyer  (PM) — type  miist  be  embedded  in  the  flow  at  the  intersection  of  the  shock  and 
the  edge.  Since  locally  the  flow  at  this  point  is  quzisi-steady,  it  can  only  be  either  a  sonic  or  a 
supersonic  flow  (as  seen  by  an  observer  riding  along  the  edge/shock  intersection  locus).  We  will 
discuss  the  consequences  that  result  from  having  flows  of  these  two  types. 

Along  the  edge/shock  locus,  the  sonic  parameter  is  a  function  of  the  normal  detonation  velocity 
along  the  edge,  Dn*,  and  the  shock  interface  angle,  ^e-  For  a  polytropic  eos,  with  7  the  polytropic 
exponent,  the  expression  is 


7  +  l\ 

1  Dl, 

(11) 
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where  C  is  the  sound  speed,  1 )  is  the  magnitude  of  the  particle  velocity  in  the  edge/shock  locus 
frame  and  Dn»  is  the  minimum  value  of  Dn  for  a  one-dimensional  detonation. 


If  the  flow  is  supersonic  along  the  locus,  then  disturbances  from  the  edge  can  not  propagate 
into  the  detonation  reaction  zone.  The  interface  moves  faster  laterally  than  do  acoustic  waves.  For 
this  c£ise,  no  boundary  condition  is  applied,  and  the  interface  does  not  affect  the  detonation.  As  the 
flow  turns  subsonic,  then  Dne  and  must  be  adjusted  so  that  the  sonic  condition,  C^-  |  U  1^=  0, 
is  maintained.  This  condition  serves  as  a  boundary  condition  for  the  flow. 

The  following  rule  summarizes  the  the  edge/shock  locus  boundary  condition:  monitor  the 
sonic  parameter  on  the  locus.  If  1 1/  1^<  0,  the  flow  is  supersonic  and  no  condition  is  applied. 
When  the  flow  is  either  sonic  or  subsonic,  then  Dne  and  must  be  adjusted  to  satisfy  the 
condition  C^-  |  f/  p=  0. 

c.  Reaction- Zone  Dynamics 

As  noted  above,  Eq.  (7)  is  a  one-dimensional  partial-differential  relation  that  Dn  and  4>  must 
satisfy  if  they  are  to  describe  a  two-dimensional  shock.  If  a  second  relation  between  Dn  and  <i>  can 
be  found,  we  can  convert  this  relation  to  a  partial-differential  equation  (pde),  and  in  the  process 
reduce  the  two-dimensional  shock  tracking  problem  to  a  one-dimensional  one.  For  a  number 
of  cases,  we  have  found  such  a  second  relation  between  Dn  and  k  =  When  it  exists,  thb 
relation  contains  all  the  necessary  reaction-zone  dynamics;  the  consequences  of  the  interaction 
of  the  chemical-heat  release  with  the  flow.  To  find  it,  we  must  solve  the  time-dependent  two- 
dimensional  Euler  equations.  In  order  to  solve  these  equations  for  complex  explosive  geometries, 
we  must  express  them  in  terms  of  a  natural  system  of  coordinates  that  simplifies  their  form.  In  the 
limit  that  the  radius  of  curvature  of  the  shock  is  l?a-ge  compared  to  the  reaw:tion-zone  length,  the 
coordinates  shown  in  Figure  2  are  particularly  convenient.  Bertrand  curves  that  are  everywhere 
parallel  to  the  shock  are  the  constant-f?  coordinates;  the  lines  perpendicular  to  these  curves  are 
the  constant-^  coordinates.  These  coordinates  are  related  to  the  laboratory  cartesian  frame,  by 


=  z^-r]  cos  (p 

(12) 

^  =  ri-rism<p  , 

(13) 

where  and  are  given  by  Eq.  (1).  Expressed  in  these  coordinates,  the  Euler  equations  are 


mass 

rj  -  momentum 
^  -  momentum 

and 

energy 

The  chemical  rate  law  is 
rate 


Ilp  +  f){KU,,-U,,^rt  —0  , 

(14) 

CUr,-^P,„  +  ...  =  Q  , 

(15) 

LU^  +  ^P^^-  Dn  (Ur,  =0  , 

(16) 

HE  —  + . . .  =  0 

(17) 

CX  —  z 

(18) 

We  have  displayed  only  those  terms  that  are  necessary  to  do  the  leading  order  theory  in  the  small 
/c-Iimit.  In  the  above,  the  operator  H  is 


p  is  the  density,  is  the  rj-component  of  the  particle  velocity  (at  leading  order  U,,  >  0  and 

Uri,n  <  0)>  f-particle  velocity  {U^  =  0  at  the  shock),  P  is  the  pressure,  A  is  the  degree 

of  reaction  (A  =  0  at  the  shock),  Z  is  the  chemical  rate  and  E  is  the  specific  internal  energy.  The 
above  equations,  the  standard  one-dimensional  shock  conditions,  the  kinematics  [Eq.  (7)]  and 
appropriate  initial/boimdary  conditions  completely  define  the  2D  problem  that  must  be  solved. 
Even  in  the  small-/c  limit,  this  is  a  formidable  task. 


What  we  have  shown  recently  is  that  for  certain  rate-law  forms  (i.e.,  expressions  for  ;?),  the 
important  large-scale  dynamics  is  quasi  steady.®  We  considered  relatively  long-scale  disturbances 
to  the  shock 


=  o(k)  c  1  (20) 

D„  =  r>.,+0(e2)  ,  (21) 


and  two  spatio-temporal  regimes: 


and 


(1)  “fast”  dynamics 
shock  deflection 


(2)  quasi-steady  dynamics 
shock  deflection 


<t>  =  0(6®/2) 

{fj  ^1  =  cf} 

<t>  =  0{t),  or  larger 


(22) 


(23) 


The  “fast”  scale  problem  was  necessary  to  treat  the  early  influence  of  the  two-dimensional 
initial/boundary  data,  and  to  describe  the  hydrodynamic  wavehead  that  sepzu-ates  the  reaction 
zone  into  parts  that  are  either  influenced  or  uninfluenced  by  the  edge.  As  the  flow  evolves,  the 
“fast”  scale  perturbations  become  smaller,  and  the  disturbances  to  the  one-dimensional  state 
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became  larger  and  quasi-steady.  This  quasi-steady  regime  b  particularly  simple;  the  Euler 
equations  reduce  to  the  steady  nozzle  equations  [a  steady  cylindrically-symmetric  system  of 
ordinary-differential  equations  (ode)] 


[{Dn-U^)p]^^-^PKUr,=0  ,  (24) 

etc. 

The  only  parameters  in  these  equations,  besides  the  fixed  constitutive  parameters,  are  Dn  ^d 
K.  That  is,  the  initial/boundary  data  did  not  appear  in  the  large-change  reaction-zone  dynamics. 
In  some  sense  then,  the  dynamics  is  local  and  universal.  The  resulting  one-dimensional  problem 
is  simply  the  detonation  “eigenvalue”  problem  considered  by  Wood  k.  Kirkwood.^  In  this  limit, 
detonation  shock  propagation  problem  decouples  from  the  product  expansion  region.  Therefore 
for  detonation,  no  ad  hoc  approximations  are  necessary  to  get  a  theory  for  the  shock  evolution 
that  is  local.  At  least  this  is  ths  case  for  diverging  detonation. 

The  quasi-steady  problem  defines  Dn('c).  With  k  specified,  Dn  is  determined  by  solving  an 
eigenvalue  problem.  In  addition  to  yielding  JDn(K),  this  solution  also  gives  the  reaction  zone  end 
state  as  a  function  of  k.  Thus  for  an  Important  class  of  problems,  the  reaction-zone  dynamics  is 
given  by  Dn{K),  and  the  two-dimensional  shock-evolution  problem  is  reduced  to  a  one-dimensional 
problem. 

Two  points  are  worth  noting.  First,  the  Dni^)  relation  only  contains  limited  constitutive 
information  about  the  explosive.  The  constants  in  this  relation  are  integrals  through  the  reaction 
zone  of  this  information.  Secondly,  Dn('c)  b  independent  of  initial/boundary  data.  Therefore, 
when  detailed  constitutive  information  about  the  reaction  zone  b  not  known  (the  typical  situation 
for  condensed  phase  explosives),  Dn{K)  can  be  measured  directly  via  simple  steady-state  two- 
dimensional  hydrodynamic  experiments.  Thus  we  have  a  way  of  using  simple  experiments  to 
calibrate  the  reaction-zone  dynamics.  In  turn,  the  calibrated  Dn{<)  relation  can  be  used  to 
predict  the  shock  evolution  in  complex  explosive  geometries. 

Direct  calculations  of  Dn('c)  performed  with  the  simple  polytropic  eos,  show  that  Dn{i^)  b 
sensitive  to  the  form  of  the  rate  law.^  Calculations  were  done  for  two  state-independent  rates 
with  different  depletion  forms;  square-root  depletion 


(25) 

and  simple  depletion 

je  =  (1  -  A)  . 

(26) 

The  Dn{K,)  rule  for  Eq.  (25)  is 

Dn  =  1  -  aw  , 

(27) 

while  for  Eq.  (26)  we  have 

Dn  =  1  -t-  0Kln{K)  -  aK 
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(28) 

The  constajits  a  and  0  are  not  be  confused  with  Whitham’s  curvilinear  coordinates.  Compacted 
into  these  two  constants  is  everything  that  we  need  to  know  about  the  constitutive  laws.  D^j  is 
set  to  one.  In  the  next  section,  we  give  a  brief  tutorial  that  describes  how  this  theory  can  be 
applied  to  explosive  engineering  design  problems. 

m.  Applications 
a.  Chapman- Jouauet  Wave 

The  simplest  time-dependent  problem  that  can  be  done  is  the  constant-velocity  detonation  or 
“Huygens”  construction  for  a  diverging  detonation.  For  convenience  we  take  Dn  —  1<  Equation  (7) 
then  becomes  the  simple  nonlinear-wave  equation  for  the  shock  angle  (see  Figure  3) 

+  =0  ,  (29) 

where  <^e  is  the  value  of  ^  at  the  edge  (i.e.,  at  ^  =  0).  Equation  (29)  states  that  ^  =  constant 
along  the  characteristic  lines  ^  -  (<^  -  <^e)t  =  constant,  that  is 

<^  =  <^0  along  (-(<^0-  =  io  .  (30) 

If  we  consider  a  flow  where  the  two-dimensional  shock  is  convergent  initially,  then  the  initial  angle, 
<t>o,  is  a  decreasing  function  of  the  initial  axe  length,  io-  Such  a  flow  looks  compressive,  in  the 
sense  that  the  characteristic  lines  are  convergent.  After  a  finite  time,  some  of  the  characteristics 
cross  one  another  and  the  solution  becomes  multi  valued.  Physically,  the  rule  Dn  =  1  does  not 
apply  to  a  convergent  detonation,  so  we  will  not  consider  this  case  further. 


Figure  4.  A  prototypical  diverging  detonation  problem.  The  wave  is  propagated 
with  Dn  =  1,  a  “Huygens”  construction.  Below  the  dashed  line,  the  wave  is  free  of 
boundary  effects  and  expands  as  a  circle.  Above  the  dashed  line,  the  wave  shape  is 
determined  by  applying  the  sonic  condition  along  the  radius  R3  circular  edge. 


When  the  two-dimensional  shock  is  initially  divergent,  the  initial  angle  is  an  increasing  function 

of  arc  length,  and  the  characteristic  lines  are  rarefaction  like.  An  example  of  a  divergent-wave 
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problem  that  ia  often  encountered  in  designs  b  shown  in  Figure  4.  It  is  a  prototypical  example 
of  a  diverging  detonation  that  features  the  diffraction  of  the  detonation  (i.e.,  the  “shadow  zone” 
problem).  The  left-most  vertical  line  is  a  symmetry  plane;  the  lower  horizontal  line  and  the  upper 
circular  arc  are  the  edges  of  the  explosive.  The  wave  is  initially  circular  with  a  radius  R2.  Since  the 
wave  is  perpendicular  to  the  horizontal  edge,  the  flow  along  that  edge/shock  lociis  is  sonic,  and  the 
edge  does  not  influence  the  shock  evolution.  When  the  expanding  wave  first  reaches  the  circular 
boundary,  the  flow  along  the  upper  edge/shock  locus  is  supersonic.  It  remains  supersonic  until 
the  detonation  reaches  the  point  where  the  dashed  line  is  tangent  to  the  arc.  The  region  above  the 
dashed  line  is  not  in  direct  line  of  sight  of  the  initial  data;  it  is  a  “shadow  zone.”  Diffraction  is  the 
process  that  allows  the  wave  to  spread  into  this  region.  The  solution  in  this  region  is  determined 
by  the  boundary  data  supplied  along  the  circular  edge. 

In  both  regions  of  the  problem,  the  solution  takes  a  simple  form.  The  great  advantage  of  our 
formulation  over  older  methods  is  this  simplicity  of  representation.  The  calculations  shown  in 
Figure  4  are  free  of  reaction-zone  effects.  We  conclude  this  section  by  showing  how  detonation 
shock  dynamics  can  be  used  to  include  the  important  finite  size  reaction-zone  effects  for  this 
example. 


Waii& 

We  assume  that  the  reaction-zone  dynamics  is  given  by  Eq.  (27) 


Dn  =  1  -  QtK 


and  introduce  the  change  of  variable 


(31) 


where  <f>e  is  the  angle  that  the  tangent  to  the  edge  makes  with  the  reference  direction  e. 
Substituting  these  into  the  kinematic  equation  [i.e.,  Eq.  (27)],  yields  a  “Burgers”  equation 

RiDnt 


Rz  COs(^e) 


(32) 


as  the  propagator  for  the  shock.  The  independent  variables  in  Eq.  (32)  eue  scaled  time  (t)  and 
scaled  arc  length  (x).  The  finite  length  reaction-zone  effects  enter  this  equation  as  the  transport 
term  on  the  right-hand  side.  This  is  similar  to  the  structure  of  wave-hierarchy  problems  that  arise 
in  one-dimensional  wave  propagation  problems  in  reactive  materials.^  The  second  term  on  the 
left-hand  side  represents  the  diffraction  effect.  Equation  (32)  is  a  one-dimensional  parabolic  pde. 
Thus  in  the  quasi-steady  limit,  the  reactivity  acts  to  smooth  the  shock  locus. 


Equation  (32)  was  solved  numerically  for  the  design  problem  shown  in  Figure  4.  A  mesh  was 
used  with  one  thousand  points  along  the  shock.  The  computation  time  was  one  minute  on  the 
Cray-1  supercomputer.  The  results  of  the  wave  tracking  calculation  for  a  set  of  parameter  values 
that  highlight  the  finite-length  reaction-zone  effects  are  shown  in  Figure  5. 
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-  r>  - - 

Figure  5.  The  DSD  calculation  of  the  example  considered  in  Figure  4.  The 
reaction-zone  dynamics  rule  was  Dn  =  1  -  ot/c,  where  the  magnitude  of  a  is 

shown.  Three  calculations  are  displayed:  (— - )  Dn  =  1  “Huygens.” 

^ - j  Dn  =  1  -  ctK  circularly  expanding  wave  and  (  )  the 

lull  DSD  calculation. 

The  important  parameters  in  this  calculation  are  [ajRi),  the  ratio  of  the  reaction-zone  length 
parameter  to  the  radius  of  the  booster,  and  the  ratio  of  the  booster  to  the  edge  radius. 

The  dashed  contours  correspond  to  the  standard  “Huygens”  construction  studied  in  Figure  4. 
The  dotted  contours  show  the  cylindrically  expanding  finite-length  reaction-zone  wave  without 
any  edge  effect.  The  solid  contours  show  the  complete  DSD  calculation,  including  the  edge  effects. 
Although  the  results  shown  in  the  figure  speak  well  for  themselves,  a  few  comments  are  in  order. 
Even  in  regions  of  the  flow  that  are  not  influenced  by  the  edge,  the  finite-length  reaction-zone 
effects  cause  the  detonation  to  lag  behind  the  “Huygens”  wave.  Near  the  lower  edge,  the  complete 
DSD  calculation  is  strongly  curled  back.  Along  this  edge,  the  phase  velocity  of  the  detonation 
wave  is  initially  low,  but  as  time  passes  it  builds  back  to  that  for  a  cylindrically  expanding  wave. 
Along  the  upper  surface,  no  edge  effect  is  observed  until  the  detonation  wave  passes  into  the 
“shadow  zone.”  After  this  occurs,  the  detonation  wave  is  continually  undergoing  wave  diffraction. 
Since  the  phase  velocity  at  the  edge  quickly  reaches  a  steady  value  that  is  well  below  D^j,  the 
curl  back  is  more  pronounced  in  this  region  than  at  the  lower  edge.  The  value  of  this  velocity  is 
a  fimction  of  the  radius  of  the  upper  explosive/vacuum  interface. 

TV.  Summary 

We  have  developed  a  theory  for  propagating  two-dimensional  detonation  shocks  in  complex 
explosive  assemblies.  The  three  components  of  our  method  are: 

(1)  shock  kinematics  [Eq.  (7)], 

(2)  boundary  conditions  [Eq,  (ll)],  and 

(3)  reaction-zone  dynamics  [e.g.,  Eq.  (27)]. 
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In  spirit  it  is  the  detonation  analog  of  Whitham’s  inert  shock  propagation  theory,  geometrical 
shock  dynamics.  It  is  a  rationally  derived  theory  that  applies  when  the  radius  of  curvature  of 
the  detonation  shock  is  large  compared  to  the  reaction-zone  length.  A  fully  nonlinear  theory,  it 
describes  the  large  amplitude  changes  in  the  two-dimensional  detonation  shock  that  occur  over 
long  times. 

The  DSD  method  that  we  have  developed  is  a  powerf\il  tool  that  can  be  used  to  efficiently 
model  reaction-zone  effects  in  numerical  simulations  of  detonation.  Using  this  method,  a  model 
explosive  design  calculations  was  performed  with  about  one  minute  of  supercomputer  time.  This 
is  to  be  compared  to  the  many  hours  that  are  required  for  modest  resolution  full  numerical 
simulations  of  explosive  assemblies.  In  addition  to  the  direct  computational  benefit,  this  theory 
also  increases  our  understanding  of  time-dependent  two-dimensionaJ  detonation.  For  example, 
this  theory  defines  the  relationship  between  the  detonation  wave  phase  velocity  and  the  radius  of 
the  explosive  edge  in  the  “shadow  zone.” 
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Reactive-Euler  Induction  Models 
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ABSTRACT.  A  unified  formulation  for  the  induction  period  for  all  thermal  reaction 
problems  is  presented  using  high  activation  energy  asymptotics.  The  important  parameters  in 
the  nondimensional  equations  are  the  ratios  of  characteristic  reaction,  acoustic,  and  conduction 
times  in  the  thermally  disturbed  parcel  of  a  reactive  gas  of  dimension  L.  In  larger  systems 
transport  effects  are  negligible  and  the  induction  period  is  controlled  by  reactive  gasdynamics 
equations.  Two  of  these  models  are  analyzed. 

1.  INTRODUCTION.  The  evolution  of  thermal  explosions  in  gaseous  systems  depends 
on  the  interaction  between  chemical  heat  release,  conductive  thermal  losses,  and  the  effects  of 
compressibility.  The  latter  factor  can  accelerate  reaction  rates  in  constant  volume  systems  where 
compression  heating  plays  a  role  [l],  [2],  [7].  In  unconfined  systems  however,  the  conversion  of 
some  thermal  energy  to  kinetic  energy  may  retard  the  appearance  of  thermal  runaway.  Systems 
in  which  conductive  losses  are  unimportant  will  inevitably  explode,  perhaps  faster  than  diffusive 
systems.  In  this  sense  it  is  important  to  be  able  to  predict  which  physical  processes  control  the 
evolution  of  an  exothermic  reaction  in  a  specific  gaseous  system.  In  this  paper,  we  present  the 
results  of  our  recent  studies  which  provide  a  rational  basis  for  deciding  the  correct  induction 
model  for  the  given  physical  system  and  analyze  these  models  mathematically. 

Consider  a  reactive  viscous  heat  conducting  compressible  gas  in  an  equilibrium  state  defined 
by  the  dimensional  quantities  po  =  po  =  p(x,0),  To  =  T(ar,0),  j/o  =  and  uo  = 

u(i,0)  which  represent  pressure,  density,  temperature,  concentration,  and  velocity,  respectively. 

At  time  t  =  0,  assume  a  small  initial  disturbance  is  created  on  a  length  scale  L.  Define 
X  =  i/T  as  the  new  position  vector.  Let  t  =  t/tft  be  the  new  time  scale  where  t/i  is  a 
reference  time  to  be  determined  later.  Nondimensionalize  the  system  variables,  letting  p  =  p/po, 
p  =  pIpo,  T  =  T/Tq,  y  =  y/yo,  and  u  =  Assume  a  single  one-step  irreversible 

reaction  which  has  a  rate  law  described  by  Arrhenius  kinetics.  The  complete  combustion 
system  can  then  be  written  in  nondimensional  form,  where  the  bar  notation  has  been  dropped, 
as: 


(1) 


< 


pt  +  V  •  (pu)  =  0 

p{ut  -t-  u  •  Vu)  =  -i(|^)^Vp-|-  Pr(^)p[Au  -f  5V(V  •  u)] 
pc^iTt  -h  u  •  VT)  =  7(^)V  •  {kVT)  -  (7  -  l)p(V  •  u) 


+2/X 


•y(ir-l)Pr(^ 


‘rIc 


[D  ;  V  0  u  -  i(V  •  u)2 


+tfiBpyexp[-^] 

p{yi  +  u  ■  Vy)  =  Le{^)V  •  (pDVy)  -  tnBpy 
p-  pT 


where  p  =  pjpo,  D  =  D/Do,  Cp  =  Cp/cp^,  c„  =  k  =  k/ko,  and  /v  =  A'/A’o  where 

K  =  k/pcp  is  the  thermal  diffusivity,  Cp  and  c„  are  the  specific  heats.  Also,  7  =  Cp^/c^.^  is  the  gas 
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parameter,  e  =  RTq/E  is  the  nondimensional  inverse  of  the  activation  energy,  Pr  =  Cp^no/ko, 
the  Prandtl  number,  Le  =  DqJKq  the  Lewis  number,  co  =  {^RTo)^^^  the  initial  sound  speed, 
tA  =  L/cq  the  acoustic  time  scale,  tg  =  fKo  the  conduction  time  scale  and  h  =  hyo/cvgTo  is 
the  nondimensional  heat  of  reaction. 


2.  INDUCTION  PERIOD  MODELS.  As  in  [8],  assume  that  Pr  =  0(1),  Le  =  0(1), 
h  =  0(1)  and  that  €  «C  1.  Using  the  method  of  activation  energy  asymptotics,  we  seek  simpler 
models  of  the  combustion  process.  In  (lc,d),  the  reaction  terms  contain  an  expression  of  the 
form  exp(— ^).  For  €  •<  1,  an  induction  period  theory  can  be  described  in  terms  of  the 
perturbed  variables 


(2) 


p  =  I  +  €m  p  =  I  +  (P  T  =  1  +  e9 
u  =  ev  y  =  I  —  €c 


where  we  assume  that  the  initial  temperature  disturbance  is  0(c).  If  0(c)  terms  are  ignored, 
we  obtain  the  induction  model  for  a  gaseous  system  from  (1)  using  (2): 


(3) 


'  mt  +  V  •  u  =  0 

».  =  -1  (St)’  VP  +  Pr  (^)  MAt.  +  1V(V  .  r)l 

9t  =  rnP/i€”^e*/‘€*  +  7  (^)  ~  (t  “  ^)V  ■  « 

+27(7  -  [-^(V  •  u)^  +  {V  0  u  +  (V  0  u)^}  :  V  0  v] 

ct  ~  +  Le 

>  P  =  m  +  9 


The  induction  model  (3)  contains  three  time  scales  and  tg  which  depend  on  the 

particular  thermochemical  system  with  the  reference  time  tfi  yet  to  be  specified.  The  character 
of  the  induction  models  depends  intimately  on  the  ratios  formed  from  these  three  time  scales. 
We  will  consider  initial  temperature  disturbances  on  a  macroscopic  length  scale  so  that  t.^/tc  ‘C 
1.  If  we  assume  that  the  perturbation  temperature  9  and  the  concentration  c  variations  are 
caused  by  the  chemical  reaction  process,  then  for  e  small  there  should  be  a  balance  of  the 
accumulation  terms  9t  and  Ct  in  (3)  with  the  reaction  terms  involving  e*.  It  follows  that  the 
reference  time  can  be  defined  by 


(4) 


which  represents  the  chemical  time  for  a  reaction  initiated  at  To  multiplied  by  e.  The  throe  time 
scales  are  now  completely  defined  and  the  reduced  induction  models  depend  on  their  ratios. 


The  first  case  to  be  considered  is  that  for 
(5)  ^  ® 

then  the  induction  momentum,  energy,  and  species  equations  of  (3)  can  be  written  as 

(it)'  ^ 

9t  =  he*  +  a'fA9  —  (7  —  1)V  •  v 
Ct  =  +  LeaJic 
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Since  we  are  assuming  that  the  initial  disturbances  are  spatially  macroscopic  so  that  ^ 
we  have  from  the  inductive  momentum  equation  (6a)  that  P  —  P{t)  to  a  first  approximation. 
Combining  the  mass  equation  (3a)  and  the  energy  equation  (6b), 

(7)  -r  aA5  + 

For  a  bounded  container  fi  since  the  total  mass  must  be  conserved,  p{x,  i)dx  =  vol(fi)  which 
implies  /q  m(*,  t)dx  =  0  and  hence 


We  can  thus  rewrite  (7)  as 

(8)  e,  -  aAa  ^  ~  ^  ^  9,(x,t)dx 


and  impose  initial-boundary  conditions  of  the  type 


(9) 


fl(a:,0)  =  5o(x),  I  G  fl 
fl(x,t)  =  0,  (x,t)edfix(0,oo) 


This  model  (8)-(9)  with  the  last  term  representing  the  effects  of  spatially  homogeneous  gas 
compression  was  originally  derived  by  Kassoy  and  Poland  [7]  and  was  analyzed  in  [l). 

If  the  ratio  IrUc  =  a  <  0(1)  so  that  the  reaction  time  is  much  shorter  than  the  conduction 
time,  then  (3b,c,d)  can  be  written 


(10) 


r,  =  -H  PruM  [au  +  \v{V  ■  v) 

7  itA/tc)  L  3 

9t  =  he®  -  (7  -  1)V  •  u  -h  07 A0  +  27(7  -  l)Pre/z^^^^ 

--(V  •  u)'  +  {V  0  u  -I-  (V  0  u)^)  :  V  0  u 
3 

Cl  =  +  Le  ■  a  ■  Ac. 


Because  a  =  o(l),  viscous,  conductive,  and  diffusive  effects  are  weak.  Three  subcases  are 
of  interest,  all  of  which  lead  to  reactive -Euler  explosions. 

I)  For  Ia  <  tR  <  tc,  then  from  (10a)  P  =  P{t)  to  a  first  approximation  and  the  energy 
equation  becomes 


(11) 

i 

I 


7  -  1 


9t  =  -e”  -h  ^ - -P'{t) 

1  7 


he  7  -  1 

=  — e  -I - 

7  7  vol 


^  f  9t{x,t)dx 
>lfi  Jo 
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where  ft  is  a  bounded  container. 


II)  For  0{tri)  =  <  t^,  to  first  order  the  momemtum  equation  (10a)  becomes 


(12) 


VP 


and  (3)  reduces  to 


(13) 


1 

Vt  +  - 


y(tA/tc) 


rVP  =  0 


V  .  u  +  Ipe 
7 


III)  For  tfi  •<  <C  tc,  (10a)  reduces  to  ot  =  0  or  v  =  v{x).  This  implies  th  inertial 
confinement  of  the  heated  gas  is  dominant.  Aspects  of  short  time  inertial  confinement 
have  been  discussed  by  Clarke  et  al.  [3],  Dold  [4],  and  Jackson  et  al.  [5],  [6]. 


3.  The  First  Reactive-Euler  Model.  For  an  arbitrary  bounded  container  ft  C 
the  reactive- Euler  model  (11)  can  be  written  as 


(14) 


4>t  —  -f 


7-1  1 

7  volft 


with 

(15)  <^(i,0)  =  <^(i) 

assuming  <^o(a:)  is  continuous  and  bounded  on  ft.  By  integrating  (14)  over  ft,  we  see  that  (14) 
is  equivalent  to 

(16)  4>t  =  Se'^  + 

Jn 


The  IBVP  (16)-(15)  has  a  unique  nonextendable  solution  <f>{x^t)  on  ft  x  [0,<t)  where  a  = 
-t-oo  or  (T  <  00  with  limt.*^-  sup{i^(i,t)  ;  i  6  ft}  =  oo. 

The  inital  value  problem 


(17)  a'  =  6€\  (i,0€ftx(0.T) 

(18)  a(x,0)  =  <^o(x),  x6ft 
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has  the  explicit  solution 


(19)  a(x,0  =  —ln[e 

which  blows  up  in  finite  time  T  =  exp(— <^o(a?Tn))  where  is  any  point  in  fi  at  which 
<^(x)  attains  its  absolute  maximum.  Since  a(x,t)  is  a  lower  solution  for  (16)-(15),  the  solution 
4>{x,t)  satisfies 

4>{x,t)  >  -  rft] 

and  hence  4>{x^t)  blows  up  in  finite  time  a  with  a  <T. 

To  get  more  information  about  consider  the  implicit  representation 

(20)  0(x,  t)  =  a(x,  r(t))  +  B(r(t)) 

where  a(x,r)  is  the  solution  of  (17)-(18)  and  r(t),  B(r)  are  scalar  functions  to  be  determined. 
As  given  in  (20),  4>{x,t)  is  a  solution  of  (16)-(15)  if  and  only  if 

(21)  r'  =  r(0)  =  0 

(22)  B'  =  13  f  =  /3  /  -  St]-Ux,  B{0)  =  0 

Ja  Ja 

By  integrating  (22),  (21)  can  be  solved  by  quadrature  to  get 

(23)  B(r)  =  f  /ja(x,r)-^(.))]d.=  f  jjn 
and  r  satisfies 

Id  f  r  e-'^o(x) 

^  /„ -  It\  H  ’  ” 

which  can  be  solved  by  quadrature. 

From  (20),  we  thus  have 

Theorem  1.  The  number  a  is  the  blowup  time  for  the  solution  <i>{x,  t)  of  (14)-(15)  if  and  only  if 
r(<T)  =  r  is  the  blowup  time  for  the  solution  a{x,T)  of  (17)-(18),  and  thus  cr  = 
where  x„i  is  any  point  in  Q  at  which  <t>Q  has  an  absolute  maximum. 

By  considering  (20)  and  (23),  we  can  observe  that  <^(i,f)  blows  up  at  those  points  x„,  at 
which  <^0(3;)  hjLS  its  absolute  ma.ximum  provided  that  B(r((T))  <  00.  This  is  true  if  and  only 
if  /q  a(x,r(<T))dx  <  00  which  in  turn  is  true  provided  that  —  e~^°(^’"^]dx  >  —00. 

Similarly,  (f>{x,t)  blows  up  everywhere  in  f2  at  <t  if  and  only  if  B{T{a))  =  00.  Thus, 

Theorem  2  (a)  The  solution  (f>{x,t)  of  (14)-(15)  blows  up  only  at  those  points  x^  of  fl  at 
which  <^0(1)  hcis  its  absolute  maximum  if  and  only  if 

(25)  /  -  e-'^'>(^'">]dx  >  -00. 

Jn 
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(b)  The  solution  4>{x,t)  blows  up  everywhere  in  H  at  a  if  and  only  if 


(26) 


-oo. 


The  integral  in  (25)  is  finite  if  there  is  at  most  a  finite  number  of  critical  points  i„i  £ 
at  which  <i>Q  has  an  absolute  maximum  and  if  at  each  Xm  <^(x)  is  strictly  concave  down  and 
analytic  in  a  neighborhood  of  x^-  In  this  case,  blowup  occurs  only  at  those  Xm  at  which  4>o 
has  an  absolute  maximum.  If  on  the  other  hand  ^  is  too  flat  in  a  neighborhood  of  an  i,„,  then 
blowup  occurs  everywhere  in  fi. 

A  second  method  for  representing  the  solution  <^(x,t)  of  (14)-(15)  is  to  set 


(27) 

i(x,t)  =  4>{x,t)  —  13  f  <j>{x,t)dx 

Ju 

Then  $  satisfies 

(28) 

=  aFte* 

with 

(29) 

^x,O)  =  4>o{x)-0  [  4>Q{x)dx 
Jn 

where 

(30) 

f,  =  F(0)  =  0. 

By  integrating  (28)  and  using  (30),  we  find  that  4>{x,t)  can  be  expressed  as 

where  G{x,t)  =  -  aF{t),  k  =  Note  then  that  the  blowup  time  a  for  is 

given  from  (31)  by  F{(t)  =  ke 

Since  Po(t)  =  -  (t>o{x))dx,  we  have  from  (31) 


From  (31)  and  (32),  we  have 

(33)  <?(x,<)  =  M^)  +  — ^^0(0  -  /n(l  -  7e-<^'><"^F(0) 

7  K 

from  which  we  can  conclude  that  the  temperature  evolves  from  the  initial  value  4>q{x)  through 
a  purely  time  dependent  term  related  to  the  homogeneous  pressure  increase  and  a  logarithmic 
evolution  term  with  spatial  dependence  which  has  a  shape-preserving  property. 
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4.  The  Second  Reactive- Euler  Model.  In  one  spatial  dimension,  the  reactive- Euler 
model  (13)  can  be  written  as 


4>t  -  - — ^-Pt  =  6e^ 

7 

(34)  ^  ~  ^  ^  (®’  °°) 

<  -I-  -Pt  =  Se^ 

7 

with 

(35)  (f>{x,Q)  =  4>o{x),  P{x,0)  =  Po{x),  v'[x,0)  =  vo{x) 

continuous  bounded  functions  on  2R.  Setting  a  =  2^,  b  =  S,  c  =  ^  ^  ~  y'  then,  for 

w  =  (f>  —  aP,  (34)  becomes 


(36) 


with 


wt  =  6e”'+“^ 


v[  +  cPx  =  0 


w+aP 


(37)  ■w{x,0)  =  ((>o{x)  -  aPo{x),  v\x,0)  =  Vo{x),  P{x,0)  =  Pq{x) 

Using  the  change  of  coordinate  matrix 


we  have 


(38) 


r=(_(J)-./2  and  setting  (P=3'-'(") 


Wt  = 

Vt  - 

Pt  +  XP^  = 


where  /i  =  and  A  =  (5)^^^.  Set  u  =  ^lP,  v  =  -/zu,  A  =  b  =  6,  B  =  then 

—  ^gW+u+v 

_  g^w+n+v 

_  ^gUt+U+V 

with 


(39) 


Wt 

^  Ut  -H  XUx 
Vt  -  XVx 


(40) 


ti;(x,0)  =  <f>o{x)  -  a' Poix)  =  w{x) 

u{x,0)  =  f[(cd)"*/2t;o(x)  +  Fo(a:)]  =  «(x) 

t;(x,0)  =  -|[(c<f)"‘/^vo(x)  =  Po(x)]  =  v(x). 
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Thus,  we  have  shown  that  the  reactive- Euler  induction  problem  (34)-(35)  is  equivalent  to  the 
more  symmetric  problem  (39)-(40).  Problem  (39)-(40)  is  closely  related  to  the  low  frequency- 
mean  field  equations  considered  by  Majda  and  Rosales  in  [lO]  and  the  disturbance  equations 
considered  by  Jackson,  Kapila,  and  Stewart  in  [6]. 


Let  c"*"  =  max[A,  J?],  c  =  min[A,  B],  m+  =  max{u;(x),  u(i),  m~  = 

min{u)(i),  u(i),  t)(z)}  and  consider 


(41) 

By  comparison  with  (39)-(40) 

(42) 


=  c±e3* 
— 


z(0)  =  m*. 


w(x,t) 

in[e-3*  -  3c-t]-i/3  <  ^  <  ln[e-3"*^  -  3c+t]-i/3 

vlx,t) 


Hence,  every  solution  {w,u,v)  of  (39)-(40)  blows  up  in  finite  time  with 


(43) 


1 

3<.+  g3m+ 


<T< 


1 


Note  that  <^(i,  t)  =  w{x,  /)  -f  u(x,  t)  +  v{x,  t).  Assume  henceforth  that  A  +  2B  =  l  and  that 
<^(x,t)  blows  up  9,t  Xm  €  IR  at  time  T.  We  would  like  to  describe  how  the  blowup  singularity 
evolves  at  (xT„,r).  Make  the  backward  similarity  change  of  variables 


(44) 

with 


(45) 


then 


(46) 


r 


-in(T-t), 


X~Xm 

^  {T-ty/2 


W  =  tn  -f  Aln{T  —  i) 

U  =  u  +  BlniT-t) 

V  =  v  +  Bln{T  - 1) 

S  =  W  +  U  +  V.:z<j>+  In(T  -  t), 

'  Wr  4-  =  A(e^  -  1) 

2f/„ -H  =  5(e5  -  1) 

+  Xe-^l-Vr,  =  B{e^  -  1) 

.  -b  f  S,  4-  Xe-^l\Ur,  -  V„)  =  -  1 


To  describe  how  the  blowup  singularity  evolves  would  require  analyzing  the  behavior  of 
solutions  of  (46)  as  r  becomes  infinite.  To  get  an  idea  of  what  to  e.xpect  or  hope  for,  consider 
the  easier  problem  when  there  is  no  drift,  i.e.,  A  =  0.  The  temperature  <f>  blows  up  at 
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where  Xm  is  an  absolute  maximum  point  for  <^.  Then  we  know  exactly  when  and  where  blowup 
occurs.  We  can  also  describe  precisely  how  the  singularity  evolves.  Let  z  =  ^  +  ln{T  -  t),  then 
z  is  the  solution  of 

(47)  +  =  e-  -  1 

[  2(t),  -InT)  =  Zoirj)  =  MnT'^  +  x„)  +  InT 
which  can  be  explicitly  solved  to  give 

(48)  z(t),  t)  =  -in[l  -  e’‘(l  - 
Thus, 

limr^oo2{v,T)  =  -In  |l  -  Xq  yj 

=  [l  -  f  =  Hv) 

From  this,  we  conclude  that  for  A  =  0 

t)  +  /n[(r  -t)~  n(i  -  i,n)^l  0 

uniformly  for  (i  —  Xm)'  <  J?(T  —  t)  as  t  —*  T~  which  gives  a  description  of  how  the  blowup 
evolves.  We  expect  a  similar  type  behavior  for  (39)-(40).  This  has  been  confirmed  formally  by 
[6]  and  [4]. 
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ABSTRACT 

We  report  here  on  some  of  my  recent  work  with  my  PhD.  students  (the  co-listed  authors) 
on  current  bifurcation  problems  in  Combustion  Theory,  Fluid  Flow,  and  Aerodynamics.  Our 
approach  has  been  both  computational  and  analytical.  Although  much  work  remains  to  be 
done,  the  results  presented  here  are  new  and  the  sharpest  to  date. 

1.  Counting  the  Number  of  Solutions  in  Reactive  Flow  Problems. 


The  nonlinear  elliptic  partial  differential  equation 

-Au=Xe'i^ 

has  been  of  considerable  interest  in  Combustion  Theory.  In  it,  u  represents  a  temperature  in  a 
self-heating  body  D  near  explosion,  X  represents  the  lump  exothermicity  of  the  substance  under 
consideration,  and  is  the  activation  energy.  Equation  (1.1)  may  possess  anywhere  from  zero 
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to  an  infinite  number  of  solutions,  depending  on  the  values  of  the  parameters  X  and  f  under 
consideration.  The  determination  of  the  exact  number  of  solutions  is  of  importance  to  the  asso¬ 
ciated  reactive  flow  problems. 

Although  physically  a  number  of  boundary  conditions  are  relevant,  here  we  shall  restrict 
attention  to  the  (usual)  case  of  homogeneous  Dirichlet  boundary  conditions  u  =  0  on  the  boun¬ 
dary  dQ.  Also  we  will  consider  here  only  the  so-called  type  A  geometries  (spherical).  The  par¬ 
ticular  case  of  n  =  3  dimensions  is  physically  the  most  important,  and  the  results  given  here 
will  be  for  that  case.  The  physical  interest  is  in  the  case  when  all  of  u,  X,  and  €  are  nonnega¬ 
tive. 

The  results  to  be  presented  here  for  f  >  0  will  be  published  in  more  detail  in  Ash,  Eaton, 
and  Gustafson  [3].  For  c  =  0  the  equation  (1.1)  has  a  long,  varied,  and  distinguished  history, 
found  in  the  literature  under  the  names  Liouville,  Poincare',  Bratu,  Frank-Kamenetskii,  Gelfand, 
Chandrasekhar,  among  others.  See  Gustafson  [2]  for  a  full  historical  account,  including  an 
exposition  of  Bratu’s  original  work  on  the  equation.  In  [2]  many  references  to  other  recent  work 
on  this  problem  nia.j  be  found,  and  we  will  not  repeat  them  here.  For  our  initial  numerical 
work  for  the  calculation  of  critical  bifurcation  points  for  equation  (1.1)  for  e  >  0,  see  Eaton  and 
Gustafson,  [l]. 

1.1  No  Blow  Up. 

Qualitatively,  the  case  c  =  0  (which  we  call  the  Bratu  approximation)  and  the  case 
€  >  0  (which  we  call  the  full  Arrhenius  equation)  are  fundamentally  different.  For  e  =  0  there 
exists  a  critical  X^  beyond  which  nonsingular  solutions  do  not  exist.  On  the  other  hand,  for 
€  >  0,  solutions  always  exist  for  all  positive  X.  One  way  to  view  this  situation  is  that  the  act 
of  approximation  (taking  f  =  0,  i.e.,  taking  activation  energy  a  =  to  l)e  infinite)  introduces 
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a  singularity  into  the  problem.  From  this  view,  the  singularity  is  artificial  and  should  not  be 
confused  with  solution  explosion. 

On  the  other  hand,  our  recent  numerical  and  analytical  results  [3]  show,  for  very  small  €, 
a  pronounced  tendency  to  a  6-function  like  nearly  singular  solution  profile  centered  on  a  point 
in  Q  (the  center,  for  spherical  geometries)  at  which  very  high  temperature  is  concentrated. 
Remarkably,  the  at  which  this  occurs  is  near  the  next  to  the  last  turning  point  of  the  bifur¬ 
cation  diagram,  rather  than  near  the  first  turning  point  as  one  may  have  imagined  from  an 
e  =  0  analysis. 

1.2  The  Last  Turning  Point. 

Precise  calculation  of  the  last  turning  points  of  the  bifurcation  diagrams  for  equation  (1.1) 
is  difficult,  both  numerically  and  analytically,  for  small  f  >  0. 

Figure  1  here,  taken  from  [3],  shows  the  exact  bifurcation  diagram  for  (1.1)  for  €  =0.04 
(i.e.,  activation  energy,  a  =  25).  The  G  turning  points  are  so  labelled  on  the  curve.  To  compute 
this  critical  branching  curve  it  is  more  convenient,  following  our  approach  of  [l],  to  plot  the 
bifurcation  parameter  vertically,  rather  than  horizontally,  as  is  usually  done. 

The  solution  u  was  found  to  be  closest  to  a  6-function  profile  near  the  5*^''  turning  point. 
At  the  6''*'  (and  last)  turning  point,  which  is  in  the  “no'se  level”  along  the  horizontal  axis  in  Fig¬ 
ure  1,  the  solution  profile  snaps  back  to  one  very  close  to  that  of  solutions  along  the  leftmost 
first  (stable)  branch.  Thereafter,  although  it  cannot  be  seen  from  Figure  1,  the  final  curve 
(stable)  branch  slowly  rises  and  eventually  increases  to  provide  solutions  for  all  X. 

The  numerical  scheme  [1,2,3]  that  provides  these  results  is  called  [2]  HOC  (Higher  Order 
Calculus)  inasmuch  as  it  involves  further  implicit  dilfcrentiation  of  the  equations.  This  scheme 
provides  an  enlarged  system,  in  some  ways  resembling  the  so-called  inflation  methods.  In  [3]  we 


also  employ  a  scaling  trick  which  greatly  increases  the  efficiency  (shooting  with  only  one  itera¬ 
tion)  over  that  of  the  original  scheme  in  [l). 

Our  latest  computations  [3j  have  resolved  the  e  =0.01  case  (unresolved  in  [2]).  In  this 
case  there  are  34  turning  points,  the  last  occurring  at  X  ~  10~^.  This  means  that  up  to  35 
solutions  may  occur  for  certain  X.  See  [3], 

1.3  A  Comparison  Theorem. 

Analytical  lower  bounds  for  the  last  turning  point  have  been  derived  [3]  using  comparison 
techniques.  Their  proofs  depend  on  and  are  motivated  by  the  numerical  procedures  of  the  HOC 
scheme  of  [l,2,3j.  One  of  them  is  the  following: 

,  - 7  7^  0  whenever  ir  >  /r.,  (Norm  U)  , 

d  (Norm  u) 

where 

l-2< 

/io  (Norm  u)  =  X  (Norm  U)-4£"e  ‘ 

For  the  case  e  =0.04  this  analytical  bound  estimates  a  lower  bound  for  the  last  turning 
point  to  be:  \  —  1.57  X  10“'.  The  numerically  computed  value  of  X  at  the  last  turning  point 
(see  Table  2)  was  X  =  2.44  X  10“'.  This  is  a  very  favorable  comparison,  and  indicates  the  gen¬ 
eral  comparison  method  we  have  used  is  a  good  one. 

It  would  be  very  interesting  and  valuable  to  further  investigate  the  general  use  of  these 
numerical  and  analytical  HOC  methods  on  other  reactive  flow  problems  and  in  particular  to 
study  the  implications  of  these  results  to  Combustion  Theory.  For  example,  the  presence  of  two 
(low  and  high  temperature)  stable  branches  for  X  greater  than  a  (very  small)  last  turning 
point  indicates  an  interpretation  of  explosion  as  a  solution  jump  rather  than  singularity.  Also 
there  are  interesting  basic  stiffness  questions  arising  in  the  computations  that  need  more  under- 
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standing. 


2.  Vortex  Dynamics  in  Cavity  Flows 

In  Gustafson  and  Halasi  [4]  and  [5]  an  in-depth  study  of  lid  driven  cavity  flow  was  carried 
out.  The  emphasis  was  on  following  the  full  dynamics  of  the  unsteady  (time-dependent)  flow 
from  an  impulsive  start.  The  full  (  viscous,  incompressible)  two-dimensional  Navier-Stokes  equa¬ 
tions 

u,  —  Au  -f  (u  V)u  =  — Vp 
'•Re 

(2.1) 

Vu  =0 

were  simulated  under  a  NL\C  (marker  and  cell)  primitive  variable  (velocity  v  and  pressure  p) 
discretization  in  which  considerable  care  was  given  to  maintaining  correct  incompressibility  and 
pressure  conditions  near  the  boundary  dCl  of  the  cavity  Cl. 

See  [4]  for  a  full  accounting  of  previous  work  on  this  basic  fluids  problem,  a  fundamental 
geometry  for  the  study  of  the  effect  of  domain  closure  and  corners  on  evolving  fluid  dynamics. 
In  [5j  the  tolerance  to  varying  grid  size  at  Re  =  2000  was  determined  and  then  a  single  long 
run  of  360,000  time  steps  was  carried  out  in  a  depth  A  =  2  cavity  for  the  relatively  high  Rey¬ 
nolds  number  Re  =  10,000. 

From  [5]  it  appeared  that  we  had  obtained  a  periodic  solution,  and  hence  had  gone  past  a 
Hopf  bifurcation  at  some  Reynolds  number  between  Re  =  2000,  where  the  solution  became 
steady,  and  Re  =  10,000,  where  it  did  not. 

However,  in  recently  writing  the  review  chapter,  Gustafson  [G|,  1  looked  more  closely  at  the 
results  of  the  long  run  of  [5[  and  came  away  with  a  different  conclusion;  tentatively,  I  will 
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assert  that  I  found  Feigenbaum’s  constant  (5  =  4.669201  in  the  final  oscillations.  More  details 
will  be  given  in  [6]  but  I  will  explain  the  finding  briefly,  in  2.3  below. 

2.1  Computational  Reliability 

It  is  rather  astounding,  to  those  of  us  who  started  on  a  Royal  McBee  LGP-30  Machine 
(drum  memory,  4096  words,  electronic  tubes  failing  all  the  time  -  but  when  used  as  an  excuse  for 
a  coding  error,  tube  failure  was  seldom  the  case!),  that  we  may  now  routinely  expect  to  do  a 
Poisson  Solver  on  a  40  by  80  rectangular  grid  360,000  times  without  an  interruption  or  logical 
error  in  the  computation.  Such  is,  however,  the  case  these  days. 

Given  this  electronic  reliability,  we  chose  to  use  an  extremely  stable  method  (MAC)  in  the 
natural  variables  p  and  v.  Our  goal  was  to  avoid  numerical  speed  up  tricks  or  stabilizing 
devices  (no  upwinding,  etc.)  to  best  follow  what  would  be  a  representation  of  the  physical  flow. 
Physical  experiments,  by  the  way,  to  date  cannot  very  well  track  secondary  vortices  lower  in 
the  cavity  because  the  intensities  fall  off  too  quickly,  e.g.,  by  10““*  in  a  vortex  cascade  in  a 
corner. 

2.2  Periodicities 

After  180,000  time  steps  the  flow  at  Re  =  10.000  had  settled  into  an  oscillating  pattern 
which  clearly  was  not  going  to  converge  to  a  steady  final  solution.  See  the  flow  histories  of  [5j 
and  Figure  3  here.  The  latter  figure,  from  [6],  shows  final  patterns  of  the  flow  after  330,000 
time  steps.  VVe  produced  flow  portraits  only  at  every  1000  time  steps,  i.e.,  at  each  dimensionless 
time  t  =  1  second,  having  used  At  =  0.001.  In  these  flow  portraits,  the  velocity  values  have 
been  normalized,  namely,  divided  by  their  magnitude.  Thus  the  portraits  are  ciualitative; 
quantitative  magnitudes  are  too  small  to  show  as  more  than  points. 
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The  “period”  of  the  flow,  e.g.,  see  Figure  7  of  [5],  had  appeared  to  be  somewhere  between  4 
and  5  seconds. 

2.3  Feigenbaum  Frequency 

As  mentioned  above,  looking  more  closely  at  the  final  oscillation  of  the  run  of  [5],  I  found 
[6]  that  the  “period”  of  this  oscillation  is  extremely  close  to  Feigenbaum’s  universal  constant 
6=  4.669201.  To  conclude  this  I  took  the  portraits  at  14  second  intervals,  as  shown  in  Figure 
3  here,  and  noted  that  3^  =  14.007,  already  knowing  the  oscillation  pattern  to  be  repeating 
itself  at  a  frequency  somewhere  between  4  and  5  seconds. 

I  have  been  mentioning  this  result  at  conferences  since  February  1988.  The  feedback  has 
been  interesting.  It  is  of  course  objectionable  that  6  occurs  here  (it  (approximately)  definitely 
occurs,  coincidence  or  not)  as  a  “period”,  whereas  one  expects  it  to  occur  in  a  parameter  ratio 
of  increasing  Reynolds  number  differences,  for  example. 

Let  me  note  however  that  there  is  a  steady  local  Reynolds  number  buildup  in  the  region  of 
the  left  wall  oscillation.  Moreover  I  have  found  vestiges  of  at  least  one  earlier  period  doubling 
in  that  critical  flow  region.  And  time  here  is  really  a  dimensionless  iteration  parameter  of  a 
highly  coupled  quadratic  dynamical  system,  as  in  the  period  doubling  theory  of  Feigenbaum. 

Finally,  I  have  linked  the  Feigenbaum  frequency  to  the  actual  shedding  of  vortices  in  the 
high  shear  interface  region.  This  shedding,  of  alternatively  signed  tertiary  vortices,  is  shown  in 
Figure  4,  taken  from  [6].  This  shedding  started  earlier  in  the  flow  (t  ~  92)  but  could  not  main¬ 
tain  itself  until  later  (after  period  doubling).  Looking  carefully  at  the  quantitative  velocity  out¬ 
put  shows  a  small  chaotic  fluctuation  of  trajectories  about  the  (normalized)  qualitative  flow 
portraits,  e.g.,  in  an  attractor  like  fashion. 
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It  would  be  very  interesting  and  valuable  to  have  the  resources  to  run  this  flow  under 
different  grid  sizes  and  at  aspect  ratios  and  Reynolds  numbers  deviating  slightly  from  A  =  2 
and  Re  =  10,000.  A  full  parametric  study  would  be  of  great  value  as  the  bifurcation  diagram 
for  cavity  flow  is  not  at  all  known. 

The  computational  determination  of  valid  flow  conclusions  for  unsteady  flows  (e.g.,  how 
does  one  really  conclude  periodicity  of  a  flow)  will  be  a  new  chapter  in  numerical  analysis. 

3.  Vortex  Interactions  in  Aerodynamic  Flows 

We  have  begun  a  program  to  better  understand  the  physically  visualized  vortex  dynamics 
of  flows  over  airfoils,  and  to  investigate  new  numerical  methods  for  their  simulation.  Initial 
results  have  been  published  in  Gustafson  and  Leben  (7,8,9). 

3.1  Robust  Multigrid  Vortex  Resolution 

In  [7,8]  we  have  developed  a  numerical  scheme  which  has  successfully  resolved  up  to  25  of 
the  vortex  cascade  descending  into  a  corner.  This  goes  beyond  the  physics  (the  25^**  small 
corner  vortex  has  intensity  10“^^^)  and  is  based  upon  a  linear  steady  (Stokes)  fluid  model.  No 
one  really  knows  how  many  corner  subvortices  really  persist  in  a  nonlinear  Navier-Stokes  corner 
flow.  But  our  method  has  proven  its  robustness. 

3.2  Orthogonal  Grid  Generation 

In  (8,9)  we  have  implemented  a  multigrid  method  to  efficiently  generate  orthogonal  grids 
around  an  airfoil,  in  body  fitted  coordinates.  The  equations  describing  the  mapping  are 
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(fx^)^  +  (r^x,),  =  0 


+  (f  V,),  =  0 


(3.1) 


namely,  two  covariant  Dirichlet  problems  are  iteratively  solved  until  a  sufficient  degree  of 
orthogonality  is  obtained.  The  function  f  is  a  distortion  function  which  must  be  interpolated 
into  the  domain.  For  details  see  [8,9]. 

Our  method  of  grid  generation  in  principle  extends  to  3  dimensions  and  it  would  be  very 
interesting  to  examine  its  analytical  and  computational  properties  in  that  case,  as  well  as  its 
implementation  to  flow  problems. 


3.3  Vortex  Shreddings 

We  have  successfully  simulated  the  full  Navier-Stokes  flow  over  an  airfoil,  in  agreement 
with  physical  experiment.  See  [8,9]  and  Figure  5  here. 

At  moderate  Reynolds  numbers  and  constant  acceleration  we  have  been  able  to  give  the 
first  demonstration  the  enhancement  of  lift  by  vortex  shreddings.  These  simulations  also  agree 
with  physical  visualizations. 

An  example  of  vortex  shredding  is  given  in  Figure  6  taken  from  [9|.  Splitting  of  the  pri¬ 
mary  positive  lift  vortex  by  the  trailing  edge  vortex  takes  place  in  frame  20,  causing  a  decrease 
in  lift.  Then  shredding  of  the  forward  secondary  negative  lift  vortex  by  the  fragment  of  the  pri¬ 
mary  vortex  returning  to  the  wing  restores  lift,  thereby  preventing  stall  under  acceleration. 

More  details  may  be  found  in  [9]. 

It  would  be  very  interesting  and  valuable  to  have  the  resources  to  do  a  full  unsteady  flow 
analysis  using  locally  refined  grids  and  the  multigrid  FAS  feature  to  better  understand  the  fun¬ 
damentals  of  these  vortex  phenomena  as  they  occur  in  aerodynamics. 
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Figure  1.  Bifurcation  diagram  and  turning  points  for  the  full  Arrhenius  equation  in  3  dimen 
sions,  for  €  =  0.01. 
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Table  2.  Exact  values  of  the  turning  points  for  c  =0.01. 
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Figure  3.  Final  oscillation  sa  <5  =  l.GGO^Ol. 


Figure  4.  Vortex  shedding  manifestation  of  the  period  6.  A  small  (— )  tertiary  vor- 
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Figure  5 (a.).  Constaxitly  Accelerating  Flow  from  Rest 

Race  =  835.  a  =  50° 
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Figure  5(b).  Constantly  Accelerating  Flow  from  Rest 
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An  Integrated  Approach  for  Scientific  Computing 
An  Extended  Abstract 


PAUL  S.  WANG* 

Department  of  Mathematical  Sciences 
Kent  State  University 
Kent,  Ohio  44242 


1.  Introduction.  Modern  workstations  meike  it  feasible  to  investigate  such  inte¬ 
grated  computing  environments.  A  workstation-based  integrated  scientific  system 
should  be  the  tool  of  choice  for  contemporary  scientists  eind  engineers.  It  is  rela¬ 
tively  simple  to  bring  numeric,  symbolic  zmd  graphics  computing  capabilities  to  a 
single  computing  system.  What  is  more  difficult  is  to  have  a  truly  integrated  system 
where  these  techniques  work  together  with  very  little  baurier  between  them.  More 
importantly,  these  three  techniques  should  reinforce  one  another  so  that  the  whole 
is  bigger  than  the  sum  of  the  parts.  We  briefly  present  some  recent  developments 
in  this  direction: 

1.  Symbolic  derivation  of  numerical  code  for  finite  element  analysis 

2.  Automatic  numeric  code  generation  based  on  derived  formulas 

3.  Generating  programs  for  parallel  computers 

4.  Interactive  graphing  of  curves  and  surfaces  for  mathematical  formulas 

5.  Graphiced  user  interface  for  mathematical  systems 

6.  Software  packages  developed 

This  extended  abstract  is  partially  based  on  an  earlier  paper  which  appeared 
in  the  Proceedings  of  CompconSS,  the  33rd  IEEE  Computer  Society  International 
Conference,  Cathedred  Hill  Hotel,  San  Francisco,  California,  Feb.  29  -  Mar.  4,  1988. 

2.  Symbolic  Derivation  for  Finite  Element  Code.  We  have  implemented  a 
prototype  software  system  to  automate  the  derivation  of  formulas  in  finite  element 
analysis  amd  the  generation  of  programs  for  the  ninnerical  cadculation  of  these  for- 
mulais.  The  generated  code  can  be  used  with  existing  numericad  packages.  This  is 
a  general  approach  with  good  potentiad  for  many  other  scientific  and  engineering 
problems. 

2.1.  FINGER  and  GENTRAN. 

From  input  provided  by  the  user,  either  interactively  or  in  a  file,  FINGER  [8]  will 
derive  finite  element  characteristic  arrays  and  generate  FORTRAN  code  baised  on 
the  derived  formulais.  The  initiad  system  hamdles  the  isoparametric  element  family. 
Element  types  include  2-D,  3-D,  amd  shell  elements  in  linear  and  nonlineair  caises. 
The  system  allows  easy  extension  to  other  finite  element  formulations. 


’Work  reported  herein  hA*  been  supported  in  part  by  the  National  Science 


Foundation  under  Grant  CCR-8~14836 
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3.  GENTRAN.  Actued  generation  of  FORTRAN  code  from  symbolic  expressions 
or  constructs  is  performed  by  the  GENTRAN  package  [2,7]  that  we  developed.  It 
is  a  general  purpose  FORTRAN  code  generator/translator.  It  has  the  capability  of 
generating  control-flow  constructs  and  complete  subroutines  and  functions.  Large 
expressions  can  be  segmented  into  subexpressions  of  manageable  size.  Code  format¬ 
ting  routines  enable  reasonable  output  formatting  of  the  generated  code.  Routines 
are  provided  to  facilitate  the  interleaving  of  code  generation  and  other  computa¬ 
tions.  Therefore,  bits  and  pieces  of  code  can  be  generated  at  different  times  and 
combined  to  form  larger  pieces.  At  the  present  time,  work  is  going  on  to  construct 
a  LEX/YACC  based  code  translator  which  will  be  much  faster  than  GENTRAN 
and  which  can  also  produce  vectorized  f77  code. 

4.  Techniques  for  Generating  Efficient  Code.  Our  experiences  in  automatic 
code  derivation  and  generation  indicate  that  code  generated  naively  will  be  volumi¬ 
nous  and  inefficient.  We  have  used  severzd  techniques  to  generate  better  FORTRAN 
<'ode. 

(a)  Automatic  intermediate  expression  labeling. 

(b)  Using  symmetry  for  generating  functions  and  calls. 

(c)  Common  subexpression  identification. 

(d)  Using  generated  subroutines. 

5.  Generating  Code  for  Parallel  Processors.  Carrying  the  automatic  code 
derivation  and  generation  idea  one  step  further,  current  research  at  Kent  State 
University  addresses  the  derivation  and  generation  of  code  for  advanced  parallel 
computers.  As  mentioned  before,  automatic  generation  of  parallel  code  not  only 
reduces  manual  mathematical  manipulations  but  also  helps  engineers  and  scientists 
who  are  not  computer  experts  take  advantage  of  advanced  parallel  computers. 

We  have  access  to  the  Carnegie  Mellon  University  (CMU)  Warp  systolic  array 
computer  [l]  through  diaJout  lines.  We  are  able  to  make  substantial  progress  ex¬ 
perimentally  with  the  Warp  computer  because  Warp  provides  a  good  programming 
environment. 

W2  is  a  simple  Pascal-like  high-level  programming  language  [3]  for  the  Warp 
array.  W2  hides  the  low-level  details  of  the  Warp  computer  and  provides  a  high- 
level  abstraction  for  the  Warp  programmer.  Using  W2.  a  programmer  can  specify 
programs  for  each  Warp  cell  and  define  inter-cell  commimications.  It  is  the  pro¬ 
grammer's  responsibility  to  devise  an  algorithm  and  map  that  algorithm  to  cell 
programs  which  can  be  executed  in  parallel  efficiently.  Tliis  is  not  a  trivial  task  and 
is  often  central  to  finding  a  Warp  solution  to  a  problem.  W2  is  a  convenient  toc^l 
to  program  that  solution. 

6.  P-FINGER  AND  GENW2.  To  generate  key  parts  of  the  finite  element 
computation  into  parallel  code  we  have  constructed  the  P-FINGER  system  [4].  P- 
FINGER  runs  under  VAXIMA  and  is  an  enhanced  version  of  FINGER  to  derive 
parallel  code.  Along  with  P-FINGER.  a  code  generator  package.  GENW2  [6].  has 
been  developed  by  Trevor  Tan  at  Kent  State  University.  GENW2  is  a  parallel  code 
generator  written  in  Franz  LISP  and  runs  under  the  VAXIMA  symbolic  computa¬ 
tion  system.  Given  high-level  algorithm  specifications  and  expressions  in  symbolic 
representations.  GENW2  outputs  W2  code  for  the  Warp  systolic  array  computer. 
GENW2  can  be  used  from  VAXIMA  top-level  or  invoked  directly  from  the  lisp 


502 


level.  Generated  routines  may  involve  declarations,  I/O  statements,  flow  control, 
data  distribution,  subroutines,  functions  and  macros.  A  code  template  can  be  spec¬ 
ified  by  the  user  to  render  the  output  code  in  a  designated  format.  The  GENW2 
package  frees  us  from  the  syntax  details  of  the  target  parallel  language,  W2,  so  we 
can  concentrate  on  devising  the  parallel  algorithms  that  will  map  important  parts 
of  finite  element  analysis  on  the  Warp.  The  GENW2  package  can  also  be  used 
independently. 

7.  Graphics  Display  for  Scientific  Computation.  Graphics  display  will 
play  an  important  role  in  an  integrated  scientific  computing  environment.  In  such 
an  environment  graphics  display  should  be  an  integral  part  of  the  user  interface. 
A  graphics  package  [9]  for  this  purpose  has  been  implemented  to  rim  under  MAC- 
SYMA.  This  package  features  a  highly  interactive  environment,  a  multiple  window 
format  and  extensive  help  facilities.  The  capabilities  include  full  color  graphics, 
efficient  hidden  line  removal,  solid  shading  and  cubic  spline  and  least  square  curve 
fitting. 

The  package  can  display  curves  and  surfaces  given  in  either  implicit  or  parametric 
form.  The  equations  can  be  results  of  prior  symbolic  derivations.  For  plots  involv¬ 
ing  many  points,  Fortran  code  is  automatically  generated  to  compute  the  fimction 
values  more  efficiently.  The  user  has  control  over  color,  viewpoint,  rotation,  hidden 
line  treatment  etc.  of  plots.  The  control  is  provided  alternatively  through  interac¬ 
tive  menus  or  commands  typed  on  the  key-board.  Plots  can  be  superimposed  using 
different  colors. 

The  curve  fitting  capability  allows  the  user  to  enter  data  points  which  are  plotted 
as  discreet  points  on  the  graphics  display.  A  least  square  interpolation  functions 
can  then  be  calculated  and  the  curve  defined  overlays  the  points.  The  equation  for 
the  fitted  curve  can  be  returned  for  further  manipulation. 

8.  The  GI/S  Graphical  User  Interface.  The  user  interface  for  a  scientific 
computing  system  which  combines  numeric,  symbolic  and  graphical  capabilities 
should  also  be  of  advanced  design  which  not  only  provides  functionalities  to  control 
computations  but  is  easy  to  learn  and  use.  Recent  studies  in  this  direction  resulted 
in  the  MathScribe  [5]  and  the  GI/S  [10]  user  interface  systems  for  REDUCE  and 
MACSYMA  respectively.  These  represent  the  initial  steps  in  an  investigation  into 
suitable  user  interface  designs  for  complicated  scientific  computing  systems. 

The  trend  is  to  take  full  advantage  of  the  capabilities  of  a  modem  workstation. 
Multiple  windows  are  provided  to  allow  concurrent  control  of  multiple  activities.  In 
GI/S,  a  mouse  is  used  as  a  pointing  device  to  select  windows  and  expressions,  to  pop 
up  menus  and  to  issue  ccmmands.  High  resolution  graphics  is  used  for  mathemati¬ 
cal  symbols,  fonts  and  interactive  plotting  of  points,  curves  and  surfaces.  An  emacs 
style  editor  is  active  whenever  and  wherever  user  input  is  typed.  Mouse-assisted 
"cut  and  paste  "  allows  the  user  to  rearrange  text  emd  graphics  between  windows. 
Mathematical  expressions  are  displayed  in  a  textbook-like  two  dimensional  format. 
Using  the  mouse,  subexpressions  of  mathematical  formulas  can  be  selected  interac¬ 
tively.  User  specified  operations  can  be  applied  to  selected  subexpressions. 

8.1.  GI/S  windows 

In  the  GI/S  user  interface  system,  two  standard  windows  are  displayed  on  the 
screen  when  the  system  begins.  These  are  the  input  and  display  windows.  The 
input  window  provides  a  command-line  editor  and  a  history  mechanism  to  recall 
past  commands.  Results  of  computations  are  displayed  in  two  dimensional  form  in 
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the  display  window.  Other  windows  may  be  opened  by  the  user  as  needed.  There 
are  several  different  types  of  windows: 

1.  Display  window 

2.  Scratch  window 

3.  Graphics  window,  and 

4.  Help  window. 

Windows  are  named.  Each  can  be  relocated  and  re-sized  interactively  by  the  user. 
A  corner  of  each  output  window  contains  status  information  on  how  the  compu¬ 
tation  controlled  by  the  window  is  progressing.  The  mouse  buttons  are  used  for 
selection  and  for  appropriate  pop-up  menus. 

8.2  Mouse  apply 

One  way  to  exploit  the  capability  of  the  mouse  in  a  scientific  system  is  to  use 
it  to  enhance  mathematical  operations.  One  such  operation  is  singling  out  a  part 
of  a  large  expression  and  apply  a  user  specified  function  to  it  with  the  result  of 
the  function  replacing  the  original  part  in  place.  Let  us  call  this  operation  “mouse 
apply'’ . 

Studies  of  user  interface  design  of  complicated  scientific  systems  have  just  be¬ 
gun.  Standards,  protocols  and  conventions  are  still  leirgely  lacking.  However,  one 
can  be  sure  that  advances  will  be  made  and  users  will  benefit  much  from  the  next 
generation  interface  systems. 

9.  Conclusions.  Modem  workstations  offer  a  practical  way  to  integrate  nu¬ 
meric,  symbolic  and  graphics  computing  systems  into  one  comprehensive  scientific 
computing  environment.  Operations  such  as  symbolic  formula  derivation,  auto¬ 
matic  numerical  program  generation,  derivation  of  parallel  code,  graphics  display 
of  data  points  and  mathematical  equations,  and  advanced  user  interfaces  can  work 
together  and  offer  many  desirable  features  and  capabilities  that  are  otherwise  un¬ 
available.  Evolution  of  such  integrated  environment  will  one  day  provide  a  powerful 
tool  for  scientists  and  engineers  for  substantially  increased  productivity. 
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ABSTRACT 

Even  research  models  of  helicopter  dynamics  often  lead  to  a  large  number  of 
equations  of  motions  with  periodic  coefficients;  and  Floquet  theory  is  a  widely 
used  mathematical  tool  for  dynamic  analysis.  Presently,  three  types  of 
approaches  are  used  in  generating  the  equations  of  motions.  These  are:  1) 
General  purpose  symbolic  processors  such  as  REDUCE  and  MACSYMA,  2)  a  special  pur¬ 
pose  symbolic  processor  DEHIM  —Dynamic  Equations  for  Helicopter  Interpretive 
Models—,  and  3)  completely  numerical  schemes.  Comparative  aspects  of  the  first 
two  purely  algebraic  approaches  are  studied  by  applying  REDUCE  and  DEHIM  to  the 
same  set  of  problems.  These  problems  range  from  a  linear  model  with  one  degree  of 
freedom  to  a  mildly  nonlinear  multi-bladed  rotor  model  with  several  degrees  of 
freedom.  Further,  computational  issues  in  applying  Floquet  theory  are  also 
studied,  which  refer  to:  1)  the  equilibrium  solution  for  periodic  forced  response, 
2)  the  transition  matrix  for  perturbations  about  that  response  and  3)  a  small 
number  of  eigenvalues  and  eigenvectors  of  the  unsynometric  transition  matrix.  That 
study  shows  the  following:  1)  Compared  to  REDUCE,  DEHIM  is  far  more  portable  and 
economical,  but  it  is  also  less  user-friendly,  particularly  during  learning  pha¬ 
ses.  2)  The  problems  of  finding  the  periodic  response  and  eigenvalues  are  well 
conditioned. 


1.  INTRODUCTION 

Symbolic  processing  or  computer  algebra  is  a  highly  desirable  adjunct  of 
rotorcraft  dynamics  research  [1-7].  For  illustration,  we  select  one  research  area 
...  aeroelastic  stability  in  forward  flight.  Here,  the  complexity  and  extent  of 
the  process  of  deriving  the  equations  of  motions  merit  special  mention.  We 
broadly  mention  a  few  stages  of  that  process,  by  passing  details  such  as  model 
description,  ordering  scheme,  perturbation  about  a  periodic  orbit  etc.  For 
example,  these  stages  include  the  following:  1)  partial  differential  equations  of 
inplane  or  lead-lag  bending,  out-of-plane  or  flap  bending  and  elastic  torsion,  2) 
rotor-support  system  or  fuselage  equations,  3)  flow-field  equations  such  as  of 
downwash  dynamics,  4)  Galerkin-type  discretization  to  generate  ordinary  differen¬ 
tial  equations  with  periodic  coefficients,  and  5)  transformation  of  a  complete  set 
of  equations  in  rotating  or  non-rotating  coordinates,  and  state  variable  represen¬ 
tation.  Generally  blade  elasticity,  blade-to-blade  coupling  and  coupling  between 
the  rotor  and  the  rotor-support  system  introduce  a  large  number  of  state 
variables.  In  fact,  use  of  nearly  50  state  variables  has  become  rather  common 
even  in  simplified  models  of  basic  research  (interpretive  models).  The 
corresponding  picture  in  a  stochastic  environment  e.g.  rotorcraft  in  turbulence, 
is  far  more  demanding.  If  we  apply  the  second  moment  stability  criterion,  we  need 
to  generate  "state  equations”  of  order  N(N+l)/2,  [8-10].  In  other  words,  a  40  th- 
order  system  requires  820  state  equations. 

‘Supported  by  the  U.S.  Army  Research  Office. 
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Experience  both  with  manual  algebra  and  with  computer  algebra  shows  that 
computer  algebra  is  the  viable  alternative  to  manual  algebra.  This  viability  is 
expected.  After  all,  computer  algebra  is  as  much  intrinsic  to  computers  as  is 
numerical  computation,  and  the  required  expertise  is  comparable  to  that  required 
in  generating  numerical  results.  In  fact,  once  the  user  is  used  to  a  particular 
approach  or  system,  computer  algebra  becomes  rather  routine,  much  more  than 
numerical  computation.  Presently,  three  types  of  approaches  are  used:  1) 
general  purpose  or  catholic  symbolic  processors  such  as  MACSYMA  and  REDUCE  [4,7], 
2)  a  special  purpose  symbolic  processor  DEHIM  — Dynamic  Equations  for  Helicopter 
Interpretive  Models  [1-3]  and  3)  completely  numerical  schemes  [5,6],  such  as 
AGEM — Automatic  Generation  of  Equations  of  Motions  [6].  This  study  is  restricted 
to  the  first  two  purely  algebraic  approaches. 

With  this  as  background,  we  now  come  to  the  two  main  objectives  of  this 
paper.  The  first  one  is  to  compare  DEHIM  with  a  general  purpose  processor,  say 
REDUCE.  The  comparison  is  based  on  our  experience  in  solving  the  same  set  of 
helicopter  dynamics  problems  by  the  two  approaches  under  reasonably  identical  con¬ 
ditions.  Still  a  note  of  caution  is  in  order.  Such  a  comparison  involves  umpteen 
variables  many  of  which  defy  quantification  and  it  is  subjective  to  a  degree,  and 
it  may  well  be  a  boundless  exercise.  Moreover  a  multipurpose  processor  is  vir¬ 
tually  a  finished  product,  provides  numerous  services  and  is  less  amenable  to  evo¬ 
lution.  But  a  special  purpose  processor  provides  services  restricted  to  a 
specialized  area,  it  has  modular  structure  and  is  constantly  evolving.  In  spite 
of  many  gaps  and  constraints,  the  comparison  of  DEHIM  with  REDUCE  should  promote 
further  research  on  the  role  and  viability  of  special  purpose  processors  in  spe¬ 
cialized  areas,  a  research  area  in  which  only  the  barest  beginnings  have  been 
made.  Further,  that  comparison  should  contribute  to  finding  better  and  improved 
means  of  comparing  one  approach  with  the  other,  including  a  completely  numerical 
approach.  The  second  objective  is  to  broadly  outline  the  computational  aspects  of 
the  Floquet  theory,  particularly  for  high  order  (N>100)  systems.  We  begin  with 
this  second  objective. 


2.  APPLICATIONS  OF  FLOQUET  THEORY 

Rotorcraft  models  lead  to  mildly  nonlinear  ordinary  differential  equations, 
often  with  a  large  number  of  dominant  periodic  coefficients.  The  term  "mildly 
nonlinear"  implies  that  nonlinearity  is  important,  but  it  does  not  dominate  the 
solution.  Thus,  a  perturbed  linear  solution  about  a  periodic  orbit  is  justified. 
Application  of  Floquet  theory  involves  computation  of  three  items  [10]:  1)  the 
periodic  forced  response,  2)  the  transition  matrix  for  perturbations  about  that 
response  and  3)  a  small  number  of  eigenvalues  and  eigenvectors  of  the  Floquet 
transition  matrix,  which  is  the  state  transition  matrix  at  the  end  of  one  period. 
However,  for  many  problems,  we  have  to  simultaneously  and  iteratively  compute 
control  settings  along  with  response  to  obtain  a  periodic  and  desired  system 
response,  what  is  referred  to  as  vehicle  trim.  In  this  paper  the  role  of  control 
settings  is  not  studied.  For  completeness,  we  present  a  brief  background  of  these 
three  items,  and  then  present  a  set  of  numerical  coordinates,  which  provide  a 
means  of  objectively  describing  the  computational  issues  on  the  application  of 
Floquet  theory.  We  conclude  this  section  with  a  discussion  of  numerical  results 
pertaining  to  those  coordinates. 
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2.1  Equilibrium  State 

The  transient  and  forced  responses  are  connected  in  a  direct  way  in  that  the 
transient  dynamics  (about  a  periodic  equilibrium)  depend  on  that  equilibrium  solu¬ 
tion.  The  Floquet  transition  matrix  provides  this  connection.  To  elaborate,  we 
introduce  the  Nxl  state  vector  x(t)  and  the  T-periodic  NxN  state  matrix  A(t). 
For  the  Nxl  input  vector  G(t),  the  linear  forced-response  system  can  be  expressed 
as 


{x(t)}  =  [A(t)]  {x(t)}  +  {G(t)}  (1) 

Now,  the  NxN  state  transition  matrix  ®(t)  is  given  by 

[i(t)]  =  [A(t)]  [«(t)],  ®(0)  =  I,  0  ^  t  i  T  (2) 

To  compute  the  initial  state  to  give  periodicity  of  the  steady  state, 

{x(0)}  =  {x(T)},  we  first  compute  which  is  the  nonperiodic  solution  of 

the  complete  equation  (1)  at  t  =  T  for  the  zero  initial  state.  Then  we  have 

{x(0)}  ■  Cl  -  »(T)r‘  lXg(T)l  (3a) 

Thus,  the  partial  derivative  of  the  errors,  (x(T)-x(O)),  with  respect  to  the  ini¬ 
tial  state  x(0),  is  [I  -  FTM].  For  the  nonlinear  case  when  G(t)  in  equation  (1) 
is  replaced  by  G  (x,  x,  t),  we  iterate  with  an  iterative  adaptation  of  equation 
(3a): 

[I  -  ®(T)]k+i  {x^(T)  -x^(0)}k+i  =  {x£(T)  -  x^(0)|k  (3b) 

where  x^(0)  is  some  Nxl  assumed  initial  state  vector  to  start  the  iteration  (k  = 

0).  For  details  see  references  11  and  12  which  also  include  algorithmic  aspects  of 

sequentially  perturbing  each  of  the  N  elements  of  xe(0)  by  a  small  amount. 

Henceforth  we  will  represent  the  Floquet  transition  matrix  #(T)  by  FTM. 

Concerning  a  solution  strategy  which  couples  Floquet  theory  to  the  response 
analysis,  there  is  considerable  similarity  among  the  several  trimming  methods 
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[11,13,14]  and  for  illustration  we  choose  the  method  of  periodic  shooting  [11,14]. 
In  that  method,  we  Iterate  on  the  Initial  conditions  in  order  to  find  those  that 
lead  to  a  periodic  solution  of  the  nonlinear  equations.  The  Floquet  connection 
referred  to  earlier  occurs  in  the  iterative  scheme,  equation  (3b),  through  the 
matrix  [I-FTM].  Thus,  the  condition  number  of  [I-FTM]  quantifies  how  well- 
conditioned  (or  equivalently  ill-conditioned)  is  the  problem  typified  by  equations 
(3);  details  in  section  2.4  which  introduces  the  concept  and  in  section  2.5  which 
illustrates  on  the  basis  of  numerical  results. 

2.2  Floquet  Transition  Matrix  (FTM) 

The  FTM  is  part  of  the  trimming  analysis.  For  an  nth-order  system,  the 
calculation  of  the  FTM  is  equivalent  to  the  solution  of  n,  nth  order  initial- 
value  problems  or  to  one  n^xl  initial-value  problem,  what  are  referred  to  as  n- 
pass  and  single-pass  computations  [15].  To  effect  this  solution,  several 
methods  have  been  exercised,  methods  such  as  rectangular  ripple  [16],  numerical 
perturbation  [17],  and  recently  finite  elements  [18-20].  Of  these,  time¬ 
marching  in  single-pass  is  by  far  the  most  popular.  However,  much  promise 
exists  for  the  finite  element  technique  in  the  space-time  domain  [18-20].  A 
comparison  of  well  tested  IVP  codes  with  the  emerging  finite  element  approaches 
to  generate  the  FTMs  and  an  exposition  of  the  differences  among  the  various 
finite  element  formulations  present  fruitful  areas  of  research. 

2.3  Eigenanalysis  of  the  FTM 

For  large  systems,  the  crux  of  the  Floquet  analysis  is  the  eigenanalysis, 
which  becomes  more  and  more  demanding  with  increasing  order  of  the  FTM.  Due  to 
algorithmic  robustness  and  availability  of  wel 1 -documented  computer  codes,  the 
generic  QR-method  (e.g.,  EISPACK  version  for  a  general  matrix)  is  almost  exclu¬ 
sively  used  for  the  eigenanalysis  [21].  However,  for  high-order  systems  and  for 
stochastic  stability  problems,  such  usage  presents  a  computational  barrier. 

For  a  general  matrix,  the  QR-method  is  the  recommended  method  for  a 
complete  eigenalysis,  as  seen  from  its  algorithmic  structure  (e.g.,  QR 
decomposition).  While  the  operation  counts  and  the  machine  time  requirements  grow 
cubical ly  with  the  order  of  the  FTM,  the  storage  requirement  grows  as  the  square 
of  the  system  order.  Further,  the  Floquet  analysis  for  stability  requires  only 
the  dominant  characteristic  multiplier  [10,12].  (In  practice,  we  need  a  small 
subset  of  the  dominant/ sub-dominant  eigenvalues,  as  well  as  the  correspondent 
eigenvectors,  due  to  frequency  ambivalence  and  due  to  the  necessity  of  identifying 
stability  margins  of  critical  modes).  Thus,  in  summary,  these  restrictions  show 
that  the  QR  method  is  not  practical  for  large  systems  and  for  the  stochastic 
second  moment  stability  of  even  a  relatively  small  order  system  which  requires 
an  eigenanalysis  of  order  N(N+l)/2.  Two  promising  alternatives  to  the  QR-method 
are:  1)  the  simultaneous  iteration  method  [22-26]  and  2)  the  generalized 
block-Lanczos  method  [22,27,28].  However  further  research  is  required  to  ascer¬ 
tain  their  viability  for  the  Floquet  eigenanalysis  owing  to  nonsparsity  of  the 
FTMs. 

.  2.4  Computational  Reliability 

The  trimming  analysis  which  includes  the  computation  of  the  FTM,  and  the 
eigenanalysis  of  the  FTM  are  subject  to  numerical  perturbations  which  are 
involved  and  interdependent.  For  example,  the  characteristic  multipliers 
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(eigenvalues  of  the  FTM)  are  subject  to  numerical  perturbations  due  to  already 
existing  perturbations  in  the  FTM.  It  is  necessary  to  know  that  we  are  not 
dealing  with  an  ill-conditioned  problem.  That  is,  the  perturbations  or  small 
changes  in  the  data  do  not  introduce  large  changes  in  the  computed  result  or  at 
least  we  have  some  means  of  ascertaining  the  goodness  of  the  computations.  The 
problem  is  ill-conditioned  if  the  condition  number  is  large,  say,  larger  than 
100,  the  ideal  value  being  1.  To  this  end,  following  Ortega  [29],  we  introduce 
the  following  computational  reliability  coordinates: 

1.  The  matrix  condition  number  of  [I-FTM]. 

2.  Condition  number  of  characteristic  multipliers  and  the  vector  of  resi¬ 
dual  errors  of  eigenpairs  (eigenvalue  and  the  correspondent 
eigenvector) . 

The  first  coordinate  concerns  the  periodic  orbit  analysis  and  it  is  a 
priori.  The  second  set  of  two  coordinates  concerns  the  eigenanalysis 
of  the  FTM  and  is  a  posteriori.  Though  the  condition  number  concept  has  a 
rigorous  analytical  basis  [29],  the  corresponding  condition  number  analysis  for 
eigenvectors  is  too  delicate  to  be  practical  [30].  Therefore  here  we  use  a  com¬ 
bination  of  the  eigenvalue  condition  number  and  the  residual  error  of  the 
correspondent  eigenpair. 

In  the  sequel  we  give  a  very  brief  account  of  these  numerical  coordinates 
with  respect  to  a  generic  nonsymmetric  real  matrix  A,  right  eigenvector  x,  left 
eigenvector  y  and  eigenvalue  A.  We  use  the  2-norm  for  the  vector  and  the 
spectral  norm  for  the  matrix,  that  is. 


I  X  I. 


and  I  A  L 


(max.  eigenvalue  of  A  A) 


(4) 


where  x^  is  the  Hermitian  or  complex  conjugate  transpose  of  x  and  A^,  the 
transpose  of  A.  We  mention  in  passing  that  x^^y  represents  the  inner  product  of 
X  and  y.  The  vectors  are  normalized  such  that 

I  X  l2  =  I  X  1  =  I  xH  X  I  =  1  =  J  y  I  =  I  yH  y  I  (5) 

The  condition  number  of  A  or  Cond.(A)  is  given  by 
Cond.(A)  =  [maximum  eigenvalue  of  A^A]*^/ [mini mum  eigenvalue  of  A^A]^  (6) 

and  it  satisfies  the  following  Inequality: 

1  <  cond.(A)  <00  (7) 
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We  assume  that  Aj  1s  a  simple  eigenvalue, 
respect  to  each  Aj  is  given  by 

.  T  ,  -1 

Cond.(Aj)  =  I  yj  Xj  I 


Then  the  condition  number  with 

(8) 


where  yj  and  xj  are  the  left  and  right  eigenvectors  such  that 

Axj  =  Ajxj  and  A^yj  =  Ajyj  (9) 

It  Is  good  to  emphasize  that  It  Is  yj^  that  Is  used  In  equation  (8)  and  not  the 

U 

Hermitian  transpose  yj.  Referring  to  the  trimming  analysis  typified  by 
equation  (3b)  we  consider  the  following  symbolic  representation: 


[A  +  SA]  {x  +  Sx}  =  {b  +  6b} 


(10) 


For  example,  A  +  SA  represents  I-FTM,  x  and  b  respectively  representing 

{xp(T)  -  Xp(o)}  and  {xp(T)  -  Xp(o)}|(.  Under  fairly  general  conditions  It  can 
^  ^  k+1  ^  ^ 

be  shown  that  [29] 


I  6  X  I 

I  X  I 


I  6  A  I  I  6  b  I 

<  cond.(A)  { - + - } 

I  A  I  I  b  I 


(11) 


Thus  Cond.(A)  represents  the  maximum  magnification  of  the  total  relative  errors 
In  A  and  b.  That  is,  the  higher  the  value  of  cond.(A),  the  greater  Is  the  sen¬ 
sitivity  of  equation  (3b)  to  computational  perturbations,  and  consequently 
the  less  well  conditioned  is  the  problem  of  finding  the  periodic  initial  state. 
From  equation  (9),  the  relative  residual  error  e  follows: 


e 


I  Axj  -  AjXj  I  I  r  I 
I  Aj  Xj  I  I  Aj  I 


(12) 


where  r  is  the  residual  error.  In  the  following  section  we  present  the  numeri¬ 
cal  results  on  Cond.(A),  Cond.(A)  and  e,  which  are  respectively  given  by 
equations  (6),  (8)  and  (12). 

2.5  Discussion  of  Results 

In  table  1,  we  present  the  condition  numbers  of  the  FTM  and  [I-FTM] 
together  with  the  maximum  eigenvalue  condition  number  Aj^^ax  s^d  the 
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corresponding  residual  error  of  the  eigenpairs.  The  physics  of  the  problem 
refers  to  a  multibladed  rotor  system  with  3,  4  and  5  rigid  blades.  As  sketched 
in  figure  1  each  blade  has  two  degrees  of  freedom,  flapping  or  out-of-plane 
motion  and  inplane  or  lead-lag  motion.  For  the  3  and  4  bladed  models  the  feed¬ 
back  system  from  the  assumed  unsteady  aerodynamics  or  dynamic  inflow  model 
introduces  3  additional  state  variables,  and  for  the  5  bladed  model,  it  introdu¬ 
ces  5  state  variables.  Thus  we  have  15  (3x4+3),  19  (4x4+3)  and  25  (5x4+5)  state 
variables.  The  first  column  contains  the  dimensionless  velocity  parameter  y, 
the  higher  the  y  the  more  the  dominance  of  periodic  coefficients  (and 
nonlinearity).  While  the  second  and  third  columns  contain  the  condition  numbers 
of  the  FTM  and  I-FTM,  the  fourth  column  contains  the  maximum  eigenvalue  con¬ 
dition  number,  that  is  the  maximum  value  of  cond.(Aj)  with  respect  to  all  the 
simple  eigenvalues.  (For  the  data  in  Table  2,  all  the  N  eigenvalues  were  simple 
or  of  multiplicity  one.)  The  last  column  contains  the  residual  error  for  the 
eigenpair  corresponding  to  Xj^max*  results  are  extremely  interesting.  The 
FTM  is  seriously  ill-conditioned  and  this  undesirable  feature  increases  with 
increasing  y.  But  the  crucial  ingredient,  [I-FTM],  as  seen  through  equations 
(3),  is  extremely  well  conditioned,  the  ideal  value  being  one.  This  means  that 
the  problem  of  finding  the  periodic  orbit  as  typified  by  equation  (3)  is  well 
conditioned.  These  data  show  that  though  the  FTM  is  ill-conditioned  (with 
regard  to  its  inverse),  all  the  eigenvalues  of  the  FTM  are  well  conditioned. 
This  feature  is  well  corroborated  by  the  corresponding  residual  error  vector  in 
the  fifth  or  last  column. 


3.  SPECIAL  PURPOSE  PROCESSOR  DEHIM 

The  literature  on  the  multipurpose  processors  such  as  MACSYMA,  REDUCE  and 
MAPLE  is  extensive.  For  example,  the  book  by  Davenport,  Siret  and  Tournier  [31] 
is  encyclopedic.  It  contains  an  extensive  bibliography  and  provides  an 
excellent  introduction  to  the  general  algorithmic  basis  of  computer  algebra  and 
also  in  particular  to  the  use  of  MACSYMA  and  REDUCE.  By  comparison,  DEHIM,  as 
is  the  case  with  special  purpose  processors,  is  restricted  to  a  highly  spe¬ 
cialized  area  and  merits  some  introduction.  Details  intended  both  of  the 
learning  and  use  of  DEHIM  are  given  In  the  Users'  Manual  [2]  and  in  references  1 
and  3.  The  introduction  in  the  sequel  is  overly  condensed.  Nevertheless  it 
should  facilitate  an  appreciation  of  the  view  point  that  a  special  purpose  pro¬ 
cessor  can  be  developed  as  a  natural  predecessor  to  programming  for  numerical 
computations,  and  that  the  development  and  use  of  such  processors  are  no  more 
involved  than  programming  for  numerical  results  in  such  specialized  areas. 

3.1  Description  of  DEHIM 

The  four  main  aspects  of  DEHIM  are  the  following:  l)Algebraic  manipulations 
capabilities,  2)  Commands,  3)  Input-output  details,  and  4)  Special  features. 

3.1.1  Algebraic  Manipulations  Capabilities 

The  manipulations  consist  of  combining  expressions,  replacing  variables  in 
an  expression  by  designated  expressions  or  relations,  and  substituting  numerical 
or  logical  values  and  tables  into  expressions.  They  also  include  the  expansion 
of  composite  functions  and  expressions  according  to  stipulated  ordering  schemes 
and  the  collection  of  coefficients  of  a  specified  variable  in  an  expression. 
The  algebraic  manipulations  of  partial  differentiation  and  integration,  and 
matrix  operations  are  carried  out  from  the  user  supplied  rules. 
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3.1.2  Cotimands 


Several  cornmands  such  as  input  conmands  form  an  important  feature  of  the 
processor,  and  a  brief  account  with  illustration  in  parenthesis  is  given  in 
appendix  1.  Essentially,  commands  are  constructed  to  perform  various  symbolic 
manipulations  and  they  are  oriented  to  the  algebraic  manipulations  typical  in 
helicopter  dynamics  as  in  deriving  equations  to  ordering  schemes  and  transforming 
into  multiblade  (non-rotating)  coordinates. 

3.1.3  Input-Output  Details 

The  input  to  the  program  comprise  the  command  names  and  their  parameters 
which  are  in  Alpha  Numeric  Format.  Further,  the  processor  gives  two  sets  of 
outputs.  The  first  set  contains  the  resulting  expressions  of  algebraic  manipu¬ 
lation  commands,  perturbed  linear  equations  and  equations  involving  multiblade 
(nonrotating)  coordinates.  The  expressions  are  printed  term  by  term  and  one 
below  the  other  for  easy  perusal  by  the  user.  The  second  set  contains  outputs 
which  are  coded  FORTRAN  statements  of  the  equations  as  required  in  the  sub¬ 
sequent  numerical  analysis.  A  typical  input  block  diagram  is  sketched  in  Fig. 
2.  Appendix  2  gives  a  few  samples  of  intermediate  (optional)  outputs.  For 
example  RYD  there  represents  the  y-component  of  the  total  time  derivative  (in  a 
fixed  frame  of  reference)  of  a  dimensionless  position  vector  r,  as  detailed  in 
Figure  1. 

3.1.4  Special  Features 

These  features  primarily  refer  to  modular  construction  and  portability.  The 
modular  structure  permits  the  introduction  of  new  commands  or  modifications  of 
the  old  commands  to  consider  major  modifications  in  the  formulation.  Thus,  the 
same  program  can  be  utilised  to  consider  a  variety  of  modifications  or  exten¬ 
sions  of  the  original  analytical  model.  Usually  the  implementation  of  symbolic 
manipulation  systems  on  another  computer  requires  a  major  effort  in  that  it  must 
take  advantage  of  the  specific  features  of  the  hardware  and  operating  system  of 
the  host  computer.  The  present  program  originally  written  in  FORTRAN  lY  and  now 
in  77  can  be  implemented  with  minimal  assistance  from  the  host  computer,  i.e.  by 
utilising  its  Fortran  compiler.  As  such,  it  is  highly  portable.  A  reset 
counter  is  also  incorporated  which  erases  all  previous  equations  and  saves  core 
space  for  the  next  equation.  If  other  language  facilities  such  as  LISP  are 
available,  required  adaptation  is  routine.  Other  features  include  format  free 
input  and  execution  of  several  derivation  steps  through  a  single  command.  We 
conclude  this  section  by  mentioning  that  it  is  a  routine  exercise  to  incorporate 
ordering  schemes  and  tables  of  formulae  of  trigonometric  tables,  perturbation 
scheme  tables  and  multiblade  coordinate  transformation  tables  [2]. 


4.  APPLICATIONS  OF  REDUCE  AND  DEHIM 

We  begin  with  the  core  space  requirements  to  install  these  packages.  As 
expected  DEHIM  takes  far  less  core  space;  that  is,  as  shown  in  table  2,  83  ver¬ 
sus  1573  blocks.  However,  this  comparison  should  be  tempered  by  the  fact  that 
REDUCE  provides  numerous  services,  as  is  typical  of  a  multipurpose  processor. 
By  comparison  DEHIM  provides  services  that  are  restricted  to  deriving  equations 
of  motions  of  rotorcraft  dynamics  models.  In  table  2  four  cases  are  presented 
— one-bladed  and  three-bladed  rotors  in  combination  with  rigid  flap  and  rigid 
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flap-lag  blades.  While  in  hover,  the  system  has  constant  coefficients,  in  for¬ 
ward  flight,  the  system  has  periodic  coefficients.  The  treatment  includes  deri¬ 
vation  of  nonlinear  equations,  perturbed  linear  equations  for  stipulated  trim 
conditions  (no  trim  analysis)  and  transformation  into  multiblade  or  nonrotating 
coordinates  for  the  three-bladed  case.  It  is  clearly  seen  that  DEHIM  is  far 
more  economical  and  this  saving  in  machine  time  increases  rapidly  with  N.  Our 
experience  with  a  wide  range  of  problems  also  shows  that  DEHIM  is  remarkably 
portable.  However,  the  feedback  from  users  shows  that  during  inital  stages 
DEHIM  is  far  less  user-friendly  compared  to  REDUCE.  This  is  probably  due  to  two 
reasons.  First,  the  present  USERS'  manual  does  not  take  the  user  in  small  gra¬ 
dual  steps  and  merits  further  elaboration  on  the  basis  of  highly  simplified 
graded  examples.  Second,  all  the  users  had  used  REDUCE  earlier.  The  exercises 
of  table  2  were  treated  as  another  set  of  problems  to  which  REDUCE  was  applied 
once  again,  whereas  with  DEHIM  those  exercises  were  entirely  a  new  experience. 

5.  CONCLUSIONS  AND  FUTURE  WORK 

The  feasibility  of  programming  with  special  purpose  processor  DEHIM  for 
generating  the  equations  of  motions  of  helicopter  dynamics  models  with  a  priori 
ordering  schemes  is  demonstrated.  Some  examples  treated  range  from  a  four 
bladed  rotor  model  that  has  flap  bending,  lag  bending  and  torsion  degrees  of 
freedom  to  a  coupled  rotor-body  system  with  3,4  and  5  rigid  lag-flap  blades  with 
hinge  offset  and  dynamic  inflow  under  forward  flight  conditions  [1-3].  The 
viability  has  been  tested  in  including  nonlinear  airfoil  characteristics  and 
dynamic  stall  characteristics  according  to  user  supplied  tabulated  airfoil  data 
and  dynamic  stall  models.  The  program  generates  perturbed  linear  equations  from 
the  nonlinear  ordinary  (for  rigid  blades)  or  partial  (for  elastic  blades)  dif¬ 
ferential  equations  [1-3]. 

Compared  to  multipurpose  processor  REDUCE  and  with  respect  to  a  restricted 
class  of  helicopter  dynamics  problems,  DEHIM  is  far  more  portable  and  economi¬ 
cal,  though  it  is  found  to  be  less  user-friendly  during  learning  phases. 

The  modular  structure  of  the  program  allows  the  programmer  to  alter  the 
existing  modules  and  to  add  new  subroutines.  This  program  is  oriented  towards 
flexibility  of  application  and  user  modification.  Its  application-oriented  com¬ 
mands  make  user  inputs  minimal  since  many  of  the  formulation  steps  are  built 
into  commands.  The  intermediate  expression  swell  is  significantly  minimised 
since  formulation  procedures  are  carried  out  at  term  level  rather  than  at 
expression  level.  DEHIM  offers  considerable  promise  in  demonstrating  that  sym¬ 
bolic  manipulation  can  be  significantly  exploited  in  deriving  equations  of 
motions  of  helicopter  systems. 

Concerning  the  computational  reliability,  the  problem  of  finding  the  ini¬ 
tial  state  that  guarantees  periodic  forced  response  is  found  to  be  well- 
conditioned.  That  is,  the  condition  number  of  [I-FTM]  as  typified  by  equations 
(3),  is  of  the  order  of  one,  see  table  1.  This  is  remarkable  in  that  the  con¬ 
dition  number  of  FTM,  compared  to  that  of  [I-FTM]  is  extremely  high  and  it 
generally  increases  with  increasing  nondimensional  flight  speed  y.  The  present 
study  does  not  include  the  impact  of  control  inputs.  Therefore,  how  well- 
conditioned  is  the  complete  trim  problem  of  finding  the  lugmenled  vector  of  ini¬ 
tial  state  for  response  periodicity  and  control  inputs  for  desired  response 
characteristics  merits  further  research. 
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The  problem  of  finding  the  eigenvalues  in  the  Floquet  analysis  is  well  con¬ 
ditioned  in  that  the  eigenvalue  condition  numbers  are  of  the  order  of  one.  This 
finding  is  further  corroborated  by  the  computed  residual  errors  of  the 
correspondent  eigenpairs,  as  typified  by  equation  (12). 

Presently  the  QR  method  is  almost  exclusively  used  in  the  Floquet  eigena- 
nalysis  for  which  the  machine  time  grows  cubically  with  the  system  dimension  N. 
This  fact  practically  precludes  the  use  of  the  QR  method  for  large  systems 
(N>100)  and  for  the  stochastic  second  moment  stability  of  even  relatively  small 
order  systems  (N=25),  since  the  latter  case  requires  an  eigenalysis  of  order 
N(N+l)/2.  Floquet  eigenanalysi s  in  practice  requires  only  a  small  subset  of 
eigenvalues  and  eigenvectors.  Therefore,  though  the  FTM  is  generally  not 
sparse,  the  feasibility  of  using  simultaneous  iteration  and  generalized  Lanczos 
method  for  the  unsymmetric  eigenanalysis  offers  considerable  promise. 
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Table  1:  Computational  Reliability  Coordinates  for  N  *  15,  19  and  25 


N  =  15 


COND.(FTM) 

COND.(I-FTM) 

MAX.  COND(A) 

RESIDUAL  ERROR 

****************************************************************************: 

0.0 

2.79E02 

1.91 

1.51 

0.109E-14 

0.1 

9.61E01 

1.95 

2.89 

0.222E-13 

0.2 

6.42E02 

1.89 

2.32 

0.165E-12 

0.3 

7.69E03 

1.86 

2.19 

0.264E-11 

0.4 

4.50E04 

1.84 

2.22‘ 

0.123E-10 

0.5 

1.27E06 

1.83 

2.07 

0.252E-09 

N  =  19 

0.0 

1.10E02 

2.05 

1.49 

0.770E-15 

0.1 

5.59E01 

2.02 

3.77 

0.137E-13 

0.2 

6.08E02 

1.92 

2.32 

0.136E-12 

0.3 

7.79E03 

1.88 

2.13 

0.146E-11 

0.4 

9.77E04 

1.87 

2.10 

0.146E-10 

0.5 

1.36E06 

1.87 

2.09 

0.252E-09 

N  =  25 

0.0 

3.72E02 

2.05 

3.40 

0.272E-14 

0.1 

1.57E02 

1.86 

5.37 

0.890E-14 

0.2 

5.94E02 

1.91 

3.38 

0.189E-14 

0.3 

7.65E03 

1.89 

3.57 

0.889E-12 

0.4 

9.07E04 

1.87 

3.39 

0.273E-14 

0.5 

1.09E06 

1.87 

3.40 

0.130E-14 
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Table  2:  Applications  of  DEHIM  and  REDUCE  (Vax  8800) 


Approach 

Core  Space 
(In  blocks) 

REDUCE 

1573 

DEHIM 

83 

CPU  time 
(In  secs.) 

NONLINEAR 

Rigid  Flap  (N=2) 

Hover 

6.06 

1.74 

EQUATIONS 

(Single-Bladed 

Forward  Flight 

7.14 

2.11 

Rotor) 

Rigid  Flap-1ag(N=4) 

Hover 

9.02 

5.25 

Forward  Flight 

12.00 

8.26 

Rigid  FLAP  (N=2) 

Hover 

6.59 

2.81 

LINEAR 

Forward  FI Ight 

8.16 

3.25 

EQUATIONS 

(Single-Bladed 

Rotor) 

15.82 

6.43 

Forward  Flight 

34.20 

7.91 

Rigid  F1ap(N=6) 

Hover 

11.17 

3.15 

Forward  Flight 

19.18 

3.54 

MULTIBLADE 

Rigid  F1ap-lag(N=12) 

EQUATIONS 

(Three-bladed 

Hover 

139.00 

9.06 

rotor) 

Forward  Flight 

362.00 

14.39 

i 
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Appendix  1:  COMMANDS 


•  To  input  an  expression 

: %FCT=5 . X**4*Y**5+A*S I N (BT ) *COS (BT) $ 
(function  fct“5x*y*+a  sinBcosB) 


To  input  a  matrix 

:?TR(3,3)=C0S(P>;0.;-SIN(P);0;l;0;SIN(P);0;COS(P)$ 


(matrix  of  size  3x3),[Tr]= 


COS(P)  0 
0  1 
L-SIN(P)  0 


SIN(P) 

0 

C0S(P)J 


•  To  input  a  relation  table 

:REL.TAB:#RTB1:SIN(ZE)=ZE+DZ;SIN(BT)=BB+DB-.5*BB**2*DB$ 

(a  table  of  relations  named  RTBl  containing  s1n(C)=C+SC  and  sin 
(B)=B+SB-.5B*6B) 


•  To  assign  order  of  magnitude  to  the  variables 
:0RD.MAG:(BB,1,2),(DB,2,1)$ 

(the  variable  B  belongs  to  group  1  with  order  of  magnitude  e*  (e, measure 
of  magnitude  of  group  1  variables)  and  SB  belongs  to  group  2  with  order  of 
magnitude  S  (S, measure  of  magnitude  of  group  2  variables)) 


•  Scheme  for  term  retention 

: TER . RET : #TSCHM= (1,2), (2,1)$ 

(term  retention  scheme  TSCHM  defines  that  during  expansions  retain  terms 
whose  magnitude  is  limited  to  e*  for  group  1  variables  and  to  S  for  group 
2  variables) 
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Algebraic  manipulation  of  matrices 


;A1CDIFF,TAU,BT;SUBS,#RTB1]=?A2(TRAN)*?A3(INTG,RB,0.,1) 

[SUBS , #RTB2 ; TRSH , #TSCHM]+?B1*?B3 (DI FF , BID ) $ 

a*  ^ 

(matrix  CA1])*[  - {[A2]^(  /  [Asjdr)]  with  substitution 

dt  dp  0 

of  table  of  relations  RTB2  and  application  of  retention  scheme  TSCHM  }  + 
[Bl](a/aB[B3]  with  substitution  of  relations  of  table  RTBl] 

•To  input  variables  whose  coefficients  are  to  be  collected 
: VAR . COL . COE : #CVAR«0B , DB0$ 

(define  a  string  of  variables  by  CVAR  which  contains  the  names  of  the 
variables  OB  and  OBO). 

.  To  collect  coefficients  of  an  expression 
:  COL .  COE ;  liAl  (#CVAR ,  FORT ,  PTEQ)  $ 

(collect  the  coefficients  of  the  variables  defined  in  the  string  of  variables 
CVAR  of  the  function  A1  and  transform  the  coefficient  expressions  into 
FORTRAN  statements  and  store  the  details  with  index  PTEQ) 

.  Multiblade  Coordinate  Transformation 
: MUL.TRA: ?MUEQ[#RMUB ,4]=?PE$ 

(Transforms  the  expression  PE  which  is  written  in  rotating  coordinates  into 
expression  with  non-rotating  coordinates  (multiblade  coordinates  for  blades) 
using  relations  defined  in  relation  table  RMUB  of  a  4  bladed  rotor  system) 
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Appendix  2:  INTERMEDIATE  OUTPUT 


Details  of  expression  RYD 
1.000000*C0S(CY)*HEPS 
+1 . 000000*COS (BT) *C0S ( ZE ) *COS (CY) *RB 
-1.000000*BTD*SIN(BT)*C0S(ZE)*SIN(CY)*RB 
-1.000000*COS(BT)*ZED*SIN(ZE)*SIN(CY)*RB 
-1.000000*COS(BT)*SIN(ZE)*SIN(CY)*RB 
-1.000000*BTD*SIN(BT)*SIN(ZE)*COS(CY)*RB 
+1 . 000000*COS ( BT ) *ZED*COS (ZE) *COS (CY) *RB 

(Output  of  the  details  of  expression  Ry  »  hgcosilj  +  r  cosB  cosC  cos»|>  -  r3  sin 
B  cosC  s1ni|;  -rC  cosB  sinC  siniji  -  r  cosB  sinC  sin\l>  -rB  sinB  sinC  costjj  +  rC 
cosB  cosC  costji; 

where  Ry  is  the  y-component  of  the  time  derivative  of  the  position  vector.) 
Details  of  Matrix  AA  (3x1) 

Terms  of  element  (1,1) 
l.*SIN(BT)*C0S(CY)+5.*SZE 
Terms  of  element  (3,1) 

10.5*SIN(CY)*L0G(X) 

(output  of  matrix  AA(size  3x1)  which  corresponds  to 


sinBcosvp  +5sinC 
0  ) 
10.5sin\|>logx 


FORTRAN  Statements 

PEON ( 5 , 1 ) = 1 . 125*GAH2+1 . 5*MU*S I N (CY) **2 
PEQN(6,3)-5.*BETA*C0S(BETA) 

(Fortran  statements  of  equations  Psi  and  P63  which  correspond  to  1.125  Y2+1*5m 
sin2(v|i)  and  .5B  cosB,  respectively,  where  Y2  1s  an  aerodynamic  force  integral 
and  v>,  a  dimensionless  speed  parameter). 
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VARIABLES 


DESCRIPTION 


REPRESENTATION 


-  -  - 

Position  Vector  from  hub  axis  R 

I 

e  Hinge  of f-set/Rotor  Radius  HEPS 

r  Location  of  the  blade  element  RB 

(Dimensionless : 

r/ (rotor  radius  -  hinge  offset)) 
from  hinge  axis 

R  Huh  elasticity  parameter 

h 

©  Blade  pitch  setting 


DYNAMIC  INFLOW  BLOCK  DIAGRAM 

Fig.l  ROTOR  SYSTEM  WITH  DYNAMIC  INFLOW  FEEDBACK 
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VARIOUS  RELATIONS  /  FORMULAE 


FIG. 2  INPUT  BLOCK  DIAGRAM 
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HYPERBOLIC  WAVES  AND 
NONLINEAR  GEOMETRICAL  ACOUSTICS 
John  K.  Hunter 
Colorado  State  University 

ABSTRACT  This  paper  reviews  asymptotic  methods  for  weakly  nonlinear  hyperbolic 
waves.  When  applied  to  compressible  fluid  flows,  these  methods  give  a  theory  of 
nonlinear  geometrical  acoustics. 

1  INTRODUCTION 

Nonlinear  wave  propagation  is  a  unified  scientific  field  largely  because  the  basic 
phenomena  are  described  by  a  relatively  small  number  of  canonical  equations.  These 
equations  can  be  derived  systematically  from  the  primitive  equations  modelling  the 
wave  motion  by  means  of  formal  asymptotic  expansions.  The  aim  of  this  paper  is  to 
summarise  the  canonical  equations  for  weakly  nonlinear  hyperbolic  waves,  with  or 
without  the  inclusion  of  small  dissipative  effects.  We  apply  these  results  to  the 
equations  of  motion  of  a  compressible  fluid,  which  gives  a  theory  of  nonlinear 
geometrical  acoustics  (NGA). 

Exact  solutions  of  the  canonical  equations  are  of  particular  interest.  As  well  as 
providing  quantitative  asymptotic  solutions,  they  often  give  considerable  insight  into 
the  physical  processes  of  the  wave  motion.  We  therefore  note  what  is  currently 
known  about  exact  solutions  of  the  canonical  equations  presented  here. 

Unfortunately,  in  comparison  with  the  situation  for  dispersive  waves,  few  exact 
solutions  are  known.  For  example,  the  cylindrical  KdV  equation,  the  KP  equation, 
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aad  the  three  wave  resonant  interaction  equations  are  all  solvable  by  the  inverse 
scattering  transform  [1],  [11].  None  of  the  corresponding  asymptotic  equations  for 
dissipative  waves  can  be  solved  completely. 

In  section  2,  we  derive  asymptotic  equations  for  a  single  wave.  The  result  is 
the  kinematic  wave  equation  (2.4)  for  inviscid  waves,  and  the  generalized  Burgers' 
equation  (2.19)  for  viscous  waves.  In  section  3,  we  include  diffraction.  This  gives 
the  unsteady  transonic  small  disturbance  equation  (3.5),  and  the  Kuznetsov  equation 
(3.7).  In  section  4,  we  consider  wave-wave  and  wave-mean  field  interactions,  leading 
to  the  integro-differential  equations  (4.4)  and  (4.13).  Finally,  in  section  5,  these 
equations  are  specialized  to  the  case  of  sound  waves  in  a  fluid. 

Our  references  are  biased  towards  review  articles  and  books,  where  they  are 
available.  These  may  be  consulted  for  references  to  the  original  papers.  For  other 
reviews  of  asymptotic  theories  for  hyperbolic  waves,  see  Nayfeh  [33]  and  Majda  [28]. 


2  SINGLE  WAVE  EQUATIONS 

2.1  The  kinematic  wave  equation 

Let  us  begin  by  considering  a  hyperbolic  system  of  conservation  laws  in  one  space 
dimension, 

(2.1)  +  f(U),^  =  0, 

where  U(x,t)  €  and  f  :  -*  The  weakly  nonlinear  expansion  of  solutions 

of  (2.1)  is 

(2.2)  U  =  €a(^^^^,x,t)r  +  O(e^),  e  -*  0+  with  t  =  0(1). 

Here,  A  is  an  eigenvalue  of 
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A  =  Vuf(0), 

and  r  is  a  corresponding  eigenvector.  We  denote  a  left  eigenvector  by  4  and 
normalize  it  so  that  /-r  =  1.  We  assume  throughout  that  A  is  a  simple  eigenvalue 
(see  [5],  |17l  for  multiple  characteristics).  The  wave  amplitude  a(^,x,t)  is  determined 
by  means  of  the  method  of  multiple  scales  [21].  In  this  method,  (^,x,t)  are  treated 
as  independent  variables,  and  the  final  asymptotic  solution  is  obtained  by  evaluating 
at  e  \x-At).  The  equation  for  a  is  found  to  be 
(2.3)  a^,  +  Aa^  +  Alaa^  =  0. 

In  (2.3),  the  coefficient  of  the  nonlinear  term  is 

9  m  d~i. 

M  =  VuA(U)t(U)(^j^q  =  /Wy  f(0)(r,r)  =  Yk' 

i,j ,  K=1  j  k  ■' 

Thus,  M  ^  0  if  the  wave  is  genuinely  nonlinear  [23].  If  M  =  0,  nonlinear  effects  are 
negligable  to  leading  order  in  e  and  over  times  for  which  the  asymptotic  solution 
(2.2)  is  valid.  Nonlinear  effects  may  be  retained  by  rescaling  the  amplitude  in  (2.2). 
In  this  paper  we  always  assume  that  M  #  0.  For  a  discussion  of  other  cases  see 
[24],  [40]. 

The  amplitude  a  in  (2.2)  depends  on  a  "fast"  phase  where  0  =  x-At,  and 
on  "slow"  variables  (x,t).  In  this  form,  (2.2)  is  a  weakly  nonlinear  extension  of  the 
high-frequency  geometrical  optics  expansion.  Therefore,  one  name  for  this  method  is 
weakly  nonlinear  geometrical  optics  (WNGO)  [17].  Alternatively,  note  that  (2.1)  is 
invariant  under  the  change  of  variables 
X  =  f'^x,  t  = 

In  these  variables,  (2.2)  is 

U  =  fa(x-At,ex,ft)r  +  0(€"),  e  ->  0+  t  =  0(f~^), 
which  corresponds  to  a  long-time/far-field  expansion.  It  describes  a  wave  which 
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changes  slowly  in  a  frame  of  reference  moving  with  the  linearized  wave  velocity  A. 
This  point  of  vi^w  is  adopted  in  the  perturbation-reduction  method  (Taniuti  et  al 
[44]).  Thus,  provided  that  there  are  no  lower  order  source  terms  in  (2.1),  the 
high-frequency  and  far-field  expansions  are  equivalent. 

If  a  =  a(^,t)  is  independent  of  x,  then  (2.2)  describes  a  wave  which  is  uniform 
in  space  and  changing  slowly  in  time.  This  form  is  appropriate  for  initial  value 
problems.  If  a  =  a(^,x)  is  independent  of  t,  then  (2.2)  describes  a  wave  which  is 
modulated  in  space,  and  distance  x  is  a  "time-like"  variable  in  (2.3).  This  form  is 
appropriate  for  boundary  value  problems. 

The  wave-form  of  the  wave  in  (2.2)  is  described  by  the  dependence  of  a  on  9. 
There  are  two  main  cases:  oscillatory  waves,  and  wavefronts.  For  oscillatory  waves, 
a  is  a  periodic  (or  almost  periodic)  function  of  9,  and  (2.2)  is  valid  in  the  limit 
€  0+  with  x,t  =  0(1). 

The  solution  represents  a  rapidly  oscillating  wave  field,  with  frequency  and 
wavenumber  of  the  order  c,  which  is  modulated  over  distances  and  times  of  the  order 
one. 

\  typical  example  of  a  wavefront  expansion  is  when 
a(^,x,t)  -*  a_^(x,t)  as  ^  +oo, 
a(^,x,t)  -  a_(x,t)  as  ^  -00, 

as  in  a  viscous  shock  profile.  In  this  case,  (2.2)  is  valid  near  the  wavefront  x  =  At, 
but  not  necessarily  elsewhere.  Formally,  the  limit  is 
e  -  0+  with  t  =  0(1)  and  x-At  =  0(f). 

To  put  (2.3)  in  a  standard  form,  we  define 
u(^.t,.x-At)  =  Ma(^,x,t). 

This  implies  that  u(x,t,di)  satisfies 
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(2.4)  +  uu^  =  0. 

Here,  we  have  renamed  the  independent  variables  -  x  in  (2.4)  is  the  phase  variable, 
and  not  the  original  space  variable  -  and  <f>  occurs  as  a  parameter.  Equation  (2.4) 
is  called  the  kinematic  wave  equation,  or  the  inviscid  Burgers'  equation.  It  is  the 
canonical  equation  for  a  weakly  nonlinear,  genuinely  nonlinear,  hyperbolic  wave. 


Weak  solutions  of  (2.4),  in  the  conservative  form 
continue  to  provide  formally  valia  asymptotic  solutions  of  (2.1)  after  shocks  form  [3]. 


Equation  (2.4)  can  be  solved  exactly,  in  principle,  by  using  the  method  of 


characteristics  and  shock  fitting  [46]. 


2.2  Nonplanar  waves 

The  method  of  the  previous  section  generalizes  to  nonplanar  waves  in  several  space 
dimensions  and  to  nonuniform  media  which  vary  slowly  over  a  wavelength.  Consider 
a  hyperbolic  system  of  conservation  laws, 

(2.0)  L  J  ti(x,U)  +  g(x,U)  =  0, 
i=0  ‘^i  ‘ 

where  x  =  (xq,...,x^)  e  U(x)  e  and  fj,g  :  -*  R^.  We  assume 

that  U  =  0  is  a  solution  of  (2.5).  Given  a  smooth  nonzero  solution  of  (2.5),  this 
can  always  be  arranged  by  a  shift  of  dependent  variables. 

The  weakly  nonlinear  solution  of  (2.5)  is 

(2.6)  U  =  fa(^^,x)r(x)  +  0(£'^),  as  e  -  0+  with  x  =  0(1). 

This  solution  represents  a  small  amplitude,  locally  plane  wave.  To  show  the  latter 
fact,  we  expand  U  for  x  close  to  a  fixed  value  y.  From  (2.6)  we  find  that 

(2.7)  U  =  +  ^^,y]r(y)  +  o(f),  as  e  -*  0+  with  x-y  =  o(f^'^‘^). 

where 
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K  =  Dd(y)  =  {d  ,...,0  )lx-v- 

For  fixed  y,  (2.7)  is  a  plane  wave  with  frequency-wavenumber  vector  e 
The  choice  of  scaling  in  (2.7),  namely 

dimensionless  wave  amplitude  =  0(€), 

relative  change  in  frequency  per  period  ~  ^ e  =  0(f), 

|f  «| 

leads  to  a  balance  between  nonlinear  and  nonplanar  effects.  For  amplitude  <<  f, 
one  obtains  linear  geometrical  optics  in  a  first  approximation;  for 
e  <<  amplitude  <<  1  one  obtains  the  weakly  nonlinear  plane  wave  solution 
described  in  section  2.1. 

Equation  (2.6)  is  an  asymptotic  solution  of  (2.5)  if: 

(a)  The  phase  (p(x)  solves  the  eikonal  equation 


det[  S  0  Aj]  =  0, 


i=0  ^i  ‘ 


where 


(2.9)  Aj(x)  =  Vufi(x,0). 

We  denote  right  and  left  null-vectors  of  the  matrix  in  (2.8)  by  r(x)  and  ^x). 


(b)  The  amplitude  a(^,x)  solves  the  transport  equation 


(2.10) 

where 


(2.11) 


a,  +  Maa^  +  Na  —  0, 


d  ^  I  ,  d 

M(x)  =  E  /■V,,-f|(x,w,r.r), 
i=0  1  ^  ‘ 

N(x)  =  E  /■7J(Air)  +  /■V„g(x,0)r. 
i=0  1  ‘  ^ 


To  decribe  the  structure  of  these  equations,  we  introduce  the  rays  associated 
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with  <t>.  The  rays  are  curves  in  space-time  with  parametric  equation  x  =  x{s) 
defined  by 

There  is  an  n-parameter  family  of  rays,  which  we  label  by  ^x)  G  R”,  The  above 
equations  are  valid  in  regions  of  space-time  where  the  transformation  between  ray 
coordinates  (s./?i  and  x  is  smooth.  We  shall  write  functions  of  x  as  functions  of 
{s.p)  whenever  it  is  convenient  to  do  so. 

The  eikonal  equation  (2.8)  can  be  written  as  a  system  of  ODE's  along  the  rays 
associated  with  o  [17].  Equation  (2.10)  may  be  regarded  as  an  n-parameter  family 
of  PDE's  in  one  "space"  variable,  with  one  PDE  for  each  ray.  The  time-like 
variable,  s,  is  a  one-to-one  function  of  arclength  along  a  ray  and  the  space-like 
variable  6  is  the  fast  phase.  The  coefficients  in  (2.10)  are  functions  of  s  (and  the 
ray  parameters  but  are  independent  of  9. 

The  eikonal  equation  (2.8)  states  that  the  local  frequency-wavenumber  vector 
satisfies  the  local,  linearized,  high-frequency  dispersion  relation  of  (2.5).  The 
transport  equation  is  an  energy  balance  equation  for  the  wave. 

The  velocity  of  the  rays  (with  Xq  =  t  =  time). 

HT^i  ^/-Ajr,  i  =  1 . n, 

is  called  the  group  velocity  of  the  wave.  The  phase  velocity  is  the  normal  velocity 
of  tlie  wasefroji'S  0  =  constant.  It  is  given  by 
-  Oj  To  1 '  Vo, 

where 

Vo  =  {d  ,...,d  ). 

'  n 

The  phase  and  group  velocities  are  not  the  same,  in  general.  Equation  (2.10)  shov.' 
that  wave  energy  propagates  at  the  group  velocity. 
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Because  of  the  importance  of  rays,  these  theories  are  often  called  ray  methods. 
The  geometry  of  rays  was  first  used  to  study  the  propagation  of  light,  hence  the 
name  geometrical  optics. 

Equation  (2.8)  defines  the  same  phase,  and  therefore  the  same  rays,  in  the 
linear  and  the  weakly  nonlinear  cases.  Ostrovsky  [37]  therefore  calls  the  weakly 
nonlinear  theory  the  "linear  ray  approximation".  An  alternative  point  of  view  is 
obtained  by  writing  a  solution  of  (2.10)  in  the  form 

(2.12)  a(0,x)  =  a(s,/?)F(6/?). 

In  (2.12),  F  is  arbitrary,  while  a  and  ^(^,x)  satisfy 

(2.13)  a,  +  Na  =  0, 

(2.14)  =  ^  +  F(<,)J)  ll  .\I(s'./3)a(s',/J)ds'. 

.According  to  (2.12)  -  (2.14),  the  solution  is  of  the  same  form  as  the  linearized 
solution,  but  it  depends  on  a  perturbed  phase  i^(c“^Cl),x),  instead  of  on  the  linear 
phase  This  point  of  view  is  used  in  the  analytical  method  of  charactistics  [22]. 

To  put  (2.10)  in  standard  form,  we  define  the  new  variables  [26], 
a(^,x)  =  E(s,j5)u(^,s,^), 

(2.1.5)  a(s.0}  =  ll  ds', 

E(s.)3)  =  e.xp  -/®  N’(s',«  ds'. 

Using  (2.15)  in  (2.10),  and  renaming  the  independent  variables,  implies  that  u(x,t./?) 
satisfies  the  kinematic  wave  equation  (2.4).  Thus,  (2.10)  can  be  solved  by  using  the 
method  of  characteristics  and  shock  fitting. 


2.3  The  generalized  Burgers'  equation 
Now  consider  (2.5)  with  a  "viscous"  term, 


(2.16) 


n 

V 

i=0 


d 


^f(x,U)  +  g(x.U) 


1.J=0  1  J 
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In  (2.16),  the  D^j  are  mxm  matrices,  and  //  is  a  scalar  (the  "viscosity").  A  balance 
between  small  nonlinear,  nonplanar,  and  viscous  effects  is  obtained  in  the  limit  (2.6), 
with 

H  —  //  =  0(1)  as  f  -*  0+. 

The  phase  satisfies  (2.8),  as  before,  and  the  transport  equation  is 

(2.17)  a^  +  Maa^  +  Na  = 

where  M  and  N  are  given  in  (2.11)  and 

(2.1S)  P  =  /I  S  (p  <3  ^D.,(x,0)^. 
ij=0  -^i 

The  change  of  variables  (2.15)  puts  (2.17)  in  the  form 

(2.19)  u^  +  uu^  = 
where 

(2.20)  =  P(s.;3)E‘^(s,)?)M'‘(s,^, 

and  we  write  as  i/(t)  in  (2.19). 

When  u  is  constant,  (2.19)  is  called  Burgers'  equation,  and  when  u  depends  on 
t,  it  is  called  the  generalized  Burgers'  equation  (GBE).  (It  is  customary  to  use 
"generalized"  to  denote  variable  coefficients,  and  "modified"  to  denote  altered 
nonlinear  terms.) 

The  initial  value  problem  for  Burgers'  equation  can  be  solved  exactly  by  the 
Cole-Hopf  transform  [46].  The  Cole-Hopf  transform  is  a  Backlund  transform  [39] 
between  Burgers'  equation  and  the  linear  heat  equation.  However,  Nimmo  and 
Crighton  [34]  show  that  u  =  a  stant  is  the  only  case  in  which  there  is  a  Backlund 
transform  connecting  (2.19)  with  another  parabolic  PDE.  Apart  from  some  similarity 
solutions  [41],  [42],  it  seems  necessary  to  use  asymptotic  or  numerical  methods  to 
solve  the  GBE  [35],  [42]. 
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Viscous  effects  on  waves  modelled  by  (2.16)  are  weak  when  e  >>  //,  where  the 
wavelength  is  of  the  order  e.  Thus,  this  expansion  uses  a  long  wave  approximation. 
Other  long  wave  equations  (e.g.  the  KdV  equation  for  dispersive  waves)  are  reviewed 
by  Rosales  [40]. 


3  DIFFRACTION 

The  equations  described  in  the  previous  section  are  all  based  on  a  locally 
one-dimensional  approximation  to  the  wave.  In  this  section,  we  describe  some 
asymptotic  equations,  involving  two  space  variables,  which  incorporate  diffractive 
effects. 

3.1  Weak  transverse  diffraction 

First,  let  us  consider  a  hyperbolic  system  of  conservation  laws  in  two  space 
dimensions, 

(3.1)  +  f(U)^  +  g(U)^^  =  0. 

The  linearized  phase  velocity  of  a  wave  propagating  in  the  .x-direction  is 
T 

(A.O)  ,  where  A  is  an  eigenvalue  of 
A  =  VyflO). 

VVe  assume  that  A  ^  0.  We  denote  associated  right  and  left  eigenvectors  by  r  and  t 

T 

and  normalize  I  so  that  t  i  =  1.  The  group  velocity  of  the  wave  is  (A.^)  where 
/z  =  /-Br,  B  =  Vug(O). 

For  anisotropic  waves,  /z  is  generally  nonzero.  The  equations  of  the  spatial 
projections  of  the  rays  are 

I 
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(3.2) 


y  -  /V^x  =  constant. 

The  following  ansatz  desribes  a  weakly  nonlinear  wave  propagating  in  the 
.x-direction  and  diffracting  slowly  in  a  direction  orthogonal  to  the  ray  projections 

(3.2) : 

(3.3)  U  =  fa(^^^,^^:^^-^,x,y,t)r  +  0(f^/^),  e  ^  0+  with  t  =  0(1). 

The  wave  amplitude  a(^,;;,x,y,t)  satisfies 

(3.4)  ^^^y  3-  ~  0- 

In  (3.4), 

M  =  /-Vy-^Ojtr.r), 

Q  =  /•(B-/iI)s, 
where  s  is  a  solution  of 

(A-AI)s  +  (B-/iI)r  =  0. 

.\ssuming  that  M  and  Q  are  nonzero,  and  rescaling  variables  in  (3.4), 
a(^,;7,x,y,t)  =  (MQ)“^u(Q^,r;,t,x-At,y-//t), 
implies  that  u(x,y,t,^,ii’)  solves 

Equation  (3.5)  was  first  derived  in  the  context  of  transonic  flows,  and  it  is  called  the 
unsteady  transonic  small  disturbance  equation  {UTSDE)  [6].  The  equation  is  a  weakly 
nonlinear  extension  of  the  parabolic  approximation  to  the  wave  equation  [2]. 

For  the  general  system  (2.16),  the  asymptotic  solution  is  [20] 


where  0  is  a  solution  of  the  eikonal  equation  (2.8),  r  and  I  are  associated  right  and 

left  eigenvectors,  and  is  constant  along  the  rays  asociated  with  q  i.e. 
n 

S  I- \  x  V  =0. 
i=0  ‘  '^^i 
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The  amplitude  8L{0,r),x)  satisfies 

^(Es  +  Maa^  +  Na  -  Pa.^^)  +  —  0- 

Here,  d  ,  M,  N,  and  P  are  defined  in  (2.11)  and  (2.18),  and 
Q  =  ^Ks, 

where  s  is  a  solution  of 
Hs  +  Kr  =  0, 


and 


H  =  S  0^A.,  K  =  S  0  Aj. 
i=0  '^i  '  i=0  ^i  ‘ 

We  make  the  change  of  variables 
(3.6)  a{0,t],x)  =  E{s,p)\ii0,rj,a,P), 

where  E  and  a  are  defined  in  (2.15).  Then  u(x,y,t,/?)  satisfies 
(3.7a)  +  uu^  -  z^(t)u^J  + 

In  (3.7a),  i'  is  defined  in  (2.20)  and 

6  = 

This  equation  can  be  written  in  system  form, 

-IjJ  “t 

“y  "  'x  = 

and  in  potential  form  for  ({),  where  ti  =  (|)^  and 
(3.7c)  =  0. 

If  1/  =  0.  solutions  of  (3.7)  may  contain  shocks.  The  jump  conditions  for 
(3.7b)  imply  that,  if  the  equation  of  the  shock  position  is  x  =  s(y,t),  then 

c  2 

Sj,  =  <  U  >  +  CSy  . 

Here, 


<  u  > 


=  7  {lim  u(x,y,t)  + 
X-*  s  + 


lim  u(x,y,t)} 
X-*  s  - 
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is  the  average  of  u  ahead  of  and  behind  the  shock. 

For  plane  waves,  v  and  p  are  constants,  and  the  change  of  variables 
t  -  X  ^  {v~^p)^I^K 

transforms  (3.7a)  to 

(3.8)  J(u^  +  uu^  -  u^)  +  Uyy  =  0. 

Equation  (3.8)  is  known  by  several  names:  the  2-D  Burgers'  equation;  the 
Kuznetsov  equation;  the  Zabolotskaya-Khokhlov  {ZK)  equation;  or  the 
Kuznetsov-Zabolotskaya-Khokhlov  {ZKZ)  equation.  Equation  (3.7)  is  the  generalized 
Kuznetsov  equation  (GKE). 

A  GKE  may  be  transformed  into  another  GKE  with  different  coefficients  by 
the  change  of  variables 
t  = 

X  =  X  + 

y  = 

3  = 

where  <^(t)  =  p'(t)  and  K  and  L  are  constants.  Then  u  satisfies 
5_[u^  +  ^  ^ 

with 

u  =  15  =  L“|^r^/“(5. 

If  i/  =  t^  and  S  =  t^,  then  0  =  1^  and  7  =  t^,  where 


For  the  cylindrical  GKE  (see  section  -5.3.1)  p  =  1  and  q  =  -3.  This  transformation 
therefore  reduces  it  to  the  planar  Kuznetsov  equation. 

Unfortunately,  not  many  exact  solutions  of  these  equations  are  known. 
Travelling  wave  solutions  of  the  UTSDE  (3.5), 


u  =  c  4-  U(x-ct,y), 

satisfy  the  steady  transonic  small  disturbance  equation  {STSDE),  or  Karman-Guderley 
equation, 

(3.9a)  J(UUJ  +  Uyy  =  0. 

This  is  often  written  in  terms  of  a  potential  (j),  where  U  =  (})  and 
(3.9b)  =  0. 

Equation  (3.9)  is  nonlinear  and  of  mixed  type.  It  is  hyperbolic  if  U  <  0  and  elliptic 
if  U  >  0.  Solutions  can  be  found  by  using:  (a)  group  invariance  properties 
(similarity  solutions);  (b)  the  hodograph  transformation  [6].  The  hodograph 
transformation  linearizes  (3.9),  but  it  is  difficult  to  use  if  there  are  shocks. 

Another  reduction  of  the  UTSDE's  to  a  system  with  two  independent  variables 
is  obtained  for  scale-invariant  solutions  depending  on  t~^x  and  t~V-  It'  is 
convenient  to  transform  (3.5)  to  "polar"  coordinates 

r  =  x  +  j:,  ^  =  f 

w(r,^.t)  =  u(x,y,t), 
which  gives  the  cylindrical  UTSDE 

^(w^  +  wwj.  +  ^w)  4-  ^w^^  =  0. 

V 

The  similarity  solutions 

>v  =  p  =  I  =  I  +  ^1)2, 

satisfy 

(3.10a)  +  jw]  +  w^^  =  0. 

The  potential  form  is  w  =  (})^  and 
(3.10b)  i^p-p)^pp  +  ^00  "i* 

Equation  (3.10)  cannot  be  linearized  by  the  hodograph  transformation  (because  of  the 
lower  order  term),  but  Zahalak  and  Myers  [47]  found  some  particular  solutions  in  the 
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hodograph  plane. 

Gibbons  and  Kodama  [12]  give  a  generalization  of  the  hodograph  transformation 
which  applies  directly  to  the  UTSDE  (3.5).  They  use  it  to  construct  a  family  of 
solutions  which  are  polynomials  in  appropriate  dependent  variables. 

There  seem  to  be  no  known  exact  solutions  of  the  Kuznetsov  equation  (3.8) 
that  involve  nontrivial  nonlinearity,  dissipation,  and  diffraction.  Treatments  thus  far 
have  used  numerical  or  approximate  methods  [41]. 

The  UTSDE  describes  weakly  nonlinear  waves  at  singul  .r  rays  [20];  in 
particular  (3.10)  decribes  a  shock  at  a  singular  ray  [47],  [15],  [20].  The  UTSDE  is 
also  a  nonlinear  caustic  equation  (see  the  next  section);  and  it  should  describe  the 
transition  from  regular  to  Mach  reflection  for  weak  shocks.  The  Kuznetsov  equation 
has  been  used  extensively  to  model  acoustic  beams,  especially  in  connection  with 
parametric  acoustic  arrays  [14]. 

3.2  Caustics 

The  straightforward  ray  method  expansions  described  in  section  2  break  down  near 
caustics.  A  caustic  is  an  envelope  of  rays.  Straighforward  ray  methods  predict  that 
the  wave  amplitude  is  infinite  at  a  caustic.  In  fact,  the  wave  amplitude  is  limited 
by  diffraction,  and  remains  finite.  However,  the  ratio  of  the  amplitude  at  the  caustic 
to  the  amplitude  away  from  the  caustic  is  unbounded  as  the  wavelength  e  0+. 
Therefore  a  modified  asymptotic  expansion  is  needed  to  describe  a  wave  near  a 
caustic. 

The  simplest  case  is  a  smooth  convex  caustic.  On  one  side  of  the  caustic  (the 
"illuminated"  region)  there  are  two  waves,  an  incident  and  a  reflected  wave.  On 
the  other  side  (the  "shadow"  region),  the  straightforward  ray  method  expansion 


predicts  that  there  is  no  wave  field;  in  fact,  according  to  linearized  theory,  the  wave 
amplitude  decays  exponentially  into  the  shadow  region.  The  illuminated  region  is 
doubly  covered  by  rays,  while  no  rays  reach  the  shadow  region. 

Suppose  that  a  wave  forms  a  smooth  convex  caustic  at  the  surface  iix)  =  0, 
and  let  the  phases  of  the  waves  in  the  illuminated  region  p  >  0  he 
0 . 1 

The  associated  null  vectors  are  of  the  form 
r  ± 

.According  to  the  linearized  caustic  theory  for  (2.5)  [27],  p,  p,  r,  and  s  satisfy: 

det[  H  ]  =  0, 

r  H  VK  1  r  r  1  _  . 

K  H  s  ~ 


where 


(3.11) 


H  =  S  0  Aj,  K  =  S  P  A. 
i=0  ^i  ‘  i=0  '^i  ‘ 


Here,  .Aj  is  defined  in  (2.9).  We  also  use  the  left  null  vectors  /,  m  defined  by 
(m,  ^ 


K  H 


The  weakly  nonlinear  caustic  expansion  of  solutions  of  (2.6)  is 

u  =  +  0(0. 

as  f  ..  0+  with  X  =  0(1),  0  =  0(("'^). 


It  uses  the  same  phases  as  the  linearized  theory,  but  the  amplitude  a(^,i7,x)  solves  a 


nonlinear  equation, 

(3.12)  ^[(-^Ia-^)a^  +  a^^  =  0. 

In  (3.12), 

M  =  t-Z  0  7u-f.(0)(r.r). 
i=0  ' 

Equation  (3.12)  was  derived  by  Giraud  [13]  and  Flayes  [16]  for  gas  dynamics,  and  by 
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Hunter  and  Keller  [19]  for  general  systems. 

To  put  (3.12)  in  a  standard  form,  we  define 

\i{0,n)  =  Ma(^,77), 

and  we  do  not  show  the  x-dependence  e.xplicitly,  since  x  occurs  in  (3.12)  as  a 
parameter.  Then  u(x,y)  satisfies 
(3-13)  ^yy  “ 

The  change  of  variables  u  y+u,  reduces  (3.13)  to  the  STSDE  (3.9). 

When  a  smooth,  weakly  nonlinear  wave  (2.6)  forms  a  caustic,  its  amplitude 
near  the  caustic  is  of  the  order  <<  [19].  Thus,  the  wave  is  described  by 

the  linearized  version  of  (3.13)  i.e.  the  Tricomi  equation.  However,  if  the  incident 
wave  contains  a  shock,  the  linearized  theory  is  inconsistent,  because  it  predicts  a 
logarithmically  infinite  singularity  in  the  reflected  wave.  Seebass  [43]  uses  the 
nonlinear  equation  (3.13)  to  analyze  a  weak  shock  at  a  smooth  caustic  (although  a 
complete  formal  justification  of  this  procedure  has  not  beeen  given).  He  reduces 
(3.13)  to  the  STSDE,  and  uses  the  hodograph  transformation,  but  it  does  not  seem 
possible  to  satisfy  the  shock  conditions  exactly. 

A  crisped  caustic  (or  arete)  is  described  by  three  functions,  ip{x),  and 
X(x).  The  caustic  is  at 


and  the  cusp  is  at 

il'  =  \  =  0. 

To  define  the  phases,  let 

(j)(x,0  =  0(x)  +  iix)^  -  jX(x)^^  + 
and  denote  the  solutions  of 
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by  ^  Then  the  phases  are 

0j(x)  =  (t)(x,^j(x)). 

There  are  three  phases  inside  the  caustic  and  one  (real)  phase  outside  the  caustic. 

It  follows  from  the  linearized  theory  [27]  that  these  functions,  and  their 
associated  null-vectors,  satisfy 

detL  =  0, 

r  1 

L  s  =0,  (n,  m,  ^-L  =  0, 

t 

^  J 

where 

H 

L  =  J  h“-  Lk  ipJ  +  . 

-^K  J  H  -  ^xK 

Here,  H,  K  are  defined  in  (3.11),  and 
n 

J  =  S  Aj. 
i=0  '^i  ^ 

The  weakly  nonlinear  cusped  caustic  expansion  of  (2.6)  is 
u  =  e‘/2a(^  ^  ^,x)r(x)  + 

as  f  -  0+  with  X  =  0(1),  Ip  —  0(€^^^),  =  0(e^^"). 

The  equation  for  the  amplitude  a.{0,T],(^,x)  is  the  UTSDE  again, 

^(2a^  +  Maa^)  +  a^^  =  0, 

where 

M  =  6  T,A.(r,r). 

i=0  ^i  ^  ‘ 

The  UTSDE  was  derived  by  Cramer  and  Seebass  [S]  for  slowly  focusing,  w'eak  shocks, 
and  studied  further  by  Obermeir  [36] .  The  above  derivation  suggests  that  it  should 
also  describe  strongly  focusing  weak  shocks. 
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4  INTERACTING  WAVES 


So  far,  we  have  described  asymptotic  equations  for  a  single  wave.  When  several 
waves  are  present,  they  may  interact  and  generate  new  waves.  A  wave  may  also 
generate  a  mean-field.  The  asymptotic  equations  for  such  processes  are 
integro-differential  equations. 


4.1  Wave-wave  interactions 

For  simplicity,  we  consider  interactions  between  collinear  waves,  when  the  problem 
involves  one  space  dimension.  Interactions  between  oblique  plane  weaves  are  described 
by  similar  equations,  but  the  details  are  more  complicated  [18].  The  asymptotic 
analysis  for  nonplanar  wave  interactions  usually  leads  to  a  passage-through-resonance 
problem  which  has  not  been  solved. 

We  denote  the  eigenvalues  of  A  =  V^jf(0)  in  (2.1)  by 

^1  <  ^2  <  -  <  '^m’ 

and  right  and  left  eigenvectors  associated  with  Aj  are  denoted  by  r  and  /j.  We 
normalize  /j  so  that  /j-rj  =  1. 

The  asymptotic  solution,  for  m  interacting,  weakly  nonlinear  waves  is 


(4.1) 


m 


k-x-o;-! 


U  =  e  S  a.(^L- — ^,x,t)r.  +  0(e“),  t  -•  0+  with  x.t  =  0(1). 


J 


j=l 


J 


In  (4.1),  the  wave  amplitudes  aj(^,x,t),  and  their  derivatives,  are  zero-mean,  almost 
periodic  functions  of  6.  It  is  convenient  to  introduce  explicitly  wavenumbers  kj  and 
frequencies  u;j,  which  satisfy  the  dispersion  relation 


i  =  'Yj’ 


j  =  l,...,m. 


We  define 
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p  p  q 


for  distinct  j,p,q,  so  that 

‘"j  =  "jqP^P  +  '■ipq^q’ 

'‘j  =  "jqpS  '‘jpq^q- 

The  wave  amplitudes  satisfy  the  following  system  of  resonant  interaction 
equations  [29], 

+  |4M.af(.)  +  Um  |/Jap(Oa^(P,p/+P^p<)dfl 

In  (4.2),  we  show  only  the  dependence  and 

=  sum  over  all  pairs  (p.q)  with  1  <  p  <  q  <  m,  pT^j  and  q?tj. 

p<q 

The  coefficients  in  (4.2)  are 


(4.2) 


=  0. 


Tjp,  =  vVfoxvV’ 

The  simplest  case  is  when:  (a)  there  are  three  interacting  waves  e.g.  (2.1)  is  a 
3x3  system;  (b)  the  waves  are  periodic  in  0  -  we  normalize  the  period  to  2/t;  (c) 
there  are  no  spatial  modulations  i.e.  aj  is  independent  of  x;  (d)  the  frequencies  and 
wave  numbers  satisfy  the  resonance  condition 

+  uifj  +  uj^  =  0, 

ki  +  k2  +  k^j  =  0. 

Then  (4.2)  becomes 

(4.3)  aj^(^)  +  ^[5Mjaj(^)  4-  ap(Oaq(-d)d^]  =  0, 

where  (j,p,q)  is  a  cyclic  permutation  of  (1,2,3). 

Equations  (4.2)  and  (4.3)  are  in  conservation  form.  They  are  valid  in  the  weak 
sense  after  shocks  form  [3]. 


Rescaling  variables  in  (4.3), 

Uj(0,t)  =  Mjaj(0,t), 

r  M.  if  M.  ^  0 

M.  =  J  > 

J  ^0  if  =  0 

implies  that  {uj(x,t)}  satisfies 

u^^(x,t)  +  ^  /q^  U2(y,t)u2(-x-y,t)  dy}  =  0, 

(4.4)  u.2^.(x,t)  +  ■^^2  2^  ^0^  U3(y,t)u^(-x-y,t)  dy}  =  0, 

^  ^3  ^  ^0^  u^(y,t)u2(-x-y.t)  dy)  =  0. 

In  (4.4), 

.  1  if  M.  0 

J  ^  0  if  Mj  =  0 
M  .  r . 

r  =  -i-  J  pq,  (j,p,q)  =  (1,2,3),  (2,3,1),  (3,1,2). 
p  q 

Smooth  solutions  of  (4.4)  satisfy  the  following  conservation  laws  for 
1  £  P  <  q  £  3: 

(-i-5)  rp/Q''u^’^(x,t)dx  -  r^/Q''’’up^(x,t)dx  =  constant. 

Equation  (4.5)  is  a  generalization  of  the  Manley-Rowe  relations  for  dispersive  waves 
[7].  Once  shocks  form,  (4.5)  must  be  modified  to  allow  for  the  decrease  in  entropy 
across  a  shock  (see  [31]  for  the  appropriate  modification  ih  the  case  of  gas  dynamics). 
If  the  interaction  coeffecients  {Fj}  are  not  all  of  the  same  sign,  then  (4.5)  implies 

9 

that  solutions  of  (4.4)  are  bounded  in  the  L“-norm.  If  the  interaction  coeficients  are 
all  of  the  same  sign,  then  (4.4)  has  "explosively"  unstable  solutions,  which  blow  up 
in  finite  time.  The  weakly  nonlinear  approximation  is  inconsistent  after  the  blow-up 
time. 

Equation  (4.4)  simplifies  for  some  special  types  of  solutions,  namely:  (a) 
sawtooth  waves;  (b)  travelling  waves;  (c)  separable  solutions. 
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(a)  Sawtooth  waves 

We  define  the  sawtooth  function  S(x)  by 


S(x)  =  X,  |x|  <  X, 
S(x+2t)  =  S(x). 
Equation  (4.4)  has  solutions 
(4.6)  Uj(x,t)  =  Oj(t)S(x-Cj), 
where 


and  {Oj(t)}  satisies  the  ODE's, 
(4.7)  +  ,.C.^  = 

The  solution  (4.6)  is  admissible  if 


Unless  the  Fj's  are  all  positive,  solutions  of  (4.7)  typically  become  inadmissible  after 
a  finite  time. 

A  simple,  but  interesting,  special  case  of  (4.4)  which  shows  what  can  happen 
when  a  sawtooth  wave  solution  becomes  inadmissible  is 


M.^  —  —  Eq  —  —  0, 

Mj  =  1,  =  -1, 

u,,  =  Ujj  =  S(x). 

The  equation  for  u^  is 

Ut  -I-  uu^  +  S(x)  =  0. 

This  has  the  sawtooth  wave  solution, 
u(x,t)  =  -tan(t)S(x), 

which  is  admissible  for  -  <  t  <  0.  Majda,  Rosales,  and  Schonbeck  [31]  show  that 

when  t  >  0,  the  shocks  become  "cusped  rarefaction  waves",  containing  a  square  root 


singularity.  Their  solution  is:  for  0  <  t  <  ^, 
-( 7r"-x“ )  ^  xcos  t 


-(7r"-x“)  '  xcost  <  X  < 

u  =  I  -xtant  -TCOst  <  x  <  jcost, 

1  9  91/9 

(j“-x“)  '  <  X  <  -Tcost; 

and  for  t  > 

I  0  <  X  <  I, 

“  ^  I  (x2-x2)>/2  -X  <  X  <  0. 

.A  particular  solution  of  (4.6)  and  (4.7)  is 

(4.8)  Uj  =  Chx., 
where  the  constants  K-  satisfy 

If  (TjKj  >  0,  j  =  1,2,3,  then  (4.8)  is  admissible  for  t  >  0,  and  decays  to  zero  as 
t  -  +  X.  If  (TjKj  <  0,  then  (4.8)  is  admissible  for  t  <  0,  and  blows  up  as  t  -  0- 

J  J 

Otherwise,  (4.8)  is  inadmissible  for  all  t. 


(b)  Travelling  waves 

Nonlinearity  steepens  the  wave  profile  of  any  genuinely  nonlinear  hyperbolic  wave  and 
periodic  travelling  waves  do  not  exist.  Wave-wave  interactions  can  balance 
nonlinearity,  so  that  interacting  travelling  waves  are  possible.  They  are  described  bv 
solutions  of  (4.4)  of  the  form 

Uj  =  Uj(x  -  C.t  -  (.), 

where  the  wave  velocities  Cj  and  the  phase  shifts  Cj  satisfy 

c^  +  c,,  +  Cj^  =  0, 

Cl  +  C2  C3  =  0  (mod  2;r), 
and  {Uj}  solves  a  nonlinear  system  of  integral  equations. 


UjU}{x)  -  CjU.(x)  +  Up(y)Uq(-x-y)dy  =  K.. 


1  TT 
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In  (4.9),  Kj,  K.^,  Kg  are  constants. 

Pego  [3S]  gives  an  exact  solution  in  the  special  case  of  gas  dynamics  (see 

section  5.4).  If  a-  =  0,  and  c-P.  >  0  -  which  implies  that  the  P.  are  of  mixed 
J  J  J  V 

signs  (the  nonexplosive  case)  -  (4.9)  has  the  solution, 

c^c^  1/2 

u.(x)  =  ■2(j.£pa)  cos(x). 

J  p  q 

Solutions  of  (4.9)  with  0,  can  be  found  in  the  limit  Pj  >>  1  by  perturbing  off 
this  solution. 


(c)  Separable  solutions 

The  separable  solutions  of  (4.4)  are 


(4.10) 


Uj(x,t)  =  t  .X(x),  where 

3|ll,..X.2(x)  +  rjji/2"Xp(y)X^(-x-y)dyl  =  Xj(x). 


A.  particular  solution  of  (4.10),  when  the  Pj  have  the  same  signs,  is  the  sawtooth 
wave  solution  (4.S).  .41so,  for  a-  =  0,  (4.10)  has  the  solution 
X.(x)  =  271 P  P 

t '  '  '  I  ^ 


'cos(x-C.), 


where 


Cl  +  C2  +  C3  =  -  I  (mod  27r), 
sgnP^  =  sgnP.,  =  sgnPg  =  7. 

Small  amplitude  solutions  of  (4.10)  with  ^  0  may  be  found  by  perturbing  off  this 
solution  in  the  limit  Pj  >>  1. 


The  resonant  interaction  equations  can  be  generalized  in  a  straightforward  way  to 
include  other  effects,  such  as  weak  viscosity  [29]  or  dispersion.  For  example,  consider’* 
(.3.1),  and  suppose  that  the  wave  motion  is  isotropic  (^j'^Tj  =  0).  Then  the 
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following  asymptotic  solution  describes  interacting  waves  with  diffraction; 

m  k-x-cJ-t  n 

U  =  f  -  ^  aj(-i-^,— ^,x,y,t)rj  +  0(r),  e  ^  0+  with  x,t  = 

The  amplitudes  aj(  ^,/?,x,y,t)  satisfy 

p<q  li"  T/o^(«\l'‘qpj*'+'‘qjp«>^«l)  +  =  »• 

where 


Qj  =  /j-BSj, 

(A  -  A.I)s.  A  Brj  =  0. 


4.2  Wave-^nean  Geld  interactions 

.\veraging  (2.1)  with  respect  to  .x,  shows  that  the  mean  of  a  bounded  solution  is 
constant.  Therefore,  waves  cannot  generate  a  mean  field.  However,  suppose  that 
there  is  a  rapidly  varying  source  term,  so  that 
(4.U)  Uj  +  f(U)^  +  g(^)  =  0. 

(We  consider  one  space  dimension  for  simplicity  -  the  analysis  extends  easily  to 
several  dimensions.)  Then  (4.11)  has  the  following  asymptotic  solution, 

(4.12)  U  =  fa(^^^,x,t)r  +  eIJ(x,t)  +  0{c^),  e  ^  0+  with  x,t  =  0(1). 

In  (4.12),  A  is  an  eigenvalue  of  A  =  V^f(O),  and  r  and  I  are  eigenvectors,  with 
l-T  =  1.  We  assume  that  a  and  its  derivativs  are  periodic  (or  almost  periodic), 
functions  of  9  with  zero  mean.  It  is  also  straightforward  to  include  several  waves,  as 
in  section  4.1.  The  mean  field  U  satisfies  the  semi-linear  equations. 

(4.1.3a)  +  <  g  >  =  0, 

where 

<  g  >(x,t)  =  lim  Y  g[TJ(x,t)  +  a(^,x,t)r]d^. 

T-*+x 
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The  wave  amplitude  a  satisfies 

(4.13b)  a^,  +  Aa^  +  Maa^  +  /-[gCTT  +  ar)  -  <  g  >]  =  0, 

where  M  is  given  in  (2.12).  The  wave  drives  the  mean  field,  and  the  mean  field 

modulates  the  wave. 

.A  special  case  is  when  (4.11)  is  semi-linear,  meaning  that 
f(U)  =  .\U. 

Then  V  =  e~^U  satisfies 

+  AV  +  g(V)  =  0. 

The  asymptotic  solution  is  [32],  [45] 

V  =  a(^^^,x,t)r  +  U(x,t)  +  0(f),  f  ^  0+  with  x,t  =  0(1), 
where  a  and  U  satisfy  (4.13)  with  M  =  0. 

5  NONLINEAR  GEOMETRICAL  ACOUSTICS 

Nonlinear  acoustics  is  a  well-developed  subject  with  applications  to  sonic  boom 
research,  remote  sensing  in  the  ocean,  and  ultrasonic  technology.  Here,  we 
summarize  the  basic  asymptotic  equations  of  nonlinear  acoustics.  They  are  obained 
as  a  special  case  of  the  general  theory  described  in  the  previous  sections.  For  other 
reviews,  see  Crighton  [9],  [10]  and  Hamilton  [14]. 

5.1  Equations  of  motion  of  a  compressible  fluid 

The  equations  of  motion  of  a  (one  species)  compressible  fluid  are 
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+  div(pu)  =  0, 

(5.1)  (y9u)j.  +  div(pu  X  U  -  r)  =  />F, 

[p(^u-u4-e)]^  +  div[(^u-u4-e)pu  -  Tu  +  q]  =  0. 

In  (5.1),  p  is  the  mass  density  of  the  fluid,  u  6  is  the  fluid  velocity,  T  is  the 
Cauchy  stress  tensor,  e  is  the  specific  internal  energy,  q  is  the  heat  flux  vector,  and 
F  is  the  body  force  per  unit  mass.  We  neglect  any  heat  sources.  The  constitutive 
equations  for  T,  q  and  e  are 

T  —  [~p  +  /i'divu]!  +  '2yuE,  E  =  i7(^u  +  Vu  ), 
q  =  xVT. 

Here,  p  is  the  pressure,  p  is  the  shear  viscosity,  p'  =  -  ^p  is  the  dilatational 

viscosity,  with  p^  the  bulk  viscosity,  k  is  the  thermal  conductivity,  and  T  is  the 
temperature.  The  internal  energy,  temperature,  and  pressure  are  functions  of  p  and  S, 
and  they  satisfy  the  thermodynamic  relation, 

TdS  =  de  +  pd(i). 

The  conductivity  and  viscosities  k,  p,  and  p'  are  also  functions  of  p  and  S.  We 
define  the  sound  speed  c(/?,S)  by 


For  simplicity,  we  shall  consider  a  polytropic  gas,  for  which 
e  =  c^T, 

p  =  RpT  =  Kexp(|  )/?'’', 


where  c^  is  the  specific  heat  at  constant  volume,  and 


is  the  ratio  of  specific  heats.  One  obtains  the  similar  asymptotic  equations  for 
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general  equations  of  state.  Only  the  values  of  the  coefficients  are  affected. 

The  non-conservative  form  of  (5.1)  is 
^  +  pdivu  =  0, 

(5-2)  +  Vp  =  V(//'divu)  +  div{2/iE)  +  pF, 

=  2pE:E  +  p'(divu)“  +  div(/cVT), 

where 

^  +  u-V, 

is  the  material  derivative. 

Linearizing  the  nondissipative  version  of  (5.2),  with  F  =  0,  about  a  constant 
solution,  p  =  Pq,  u  =  Uq,  S  =  Sq,  and  c  =  Cq,  we  obtain  the  acoustics  equations, 

Pj  +  Pgdivu  =  0, 

(5.3)  iLj,  +  Cq^Pq'^Vp  +  Po~^PSo^S  =  0, 

Sj  =  0, 

where  dj  =  +  Ug-V.  The  plane  wave  solutions  of  (5.3)  are 

p  =  pexpi(k‘X-a,t),  S  =  Sexpi(k-x-a;t),  u  =  uexpi(k-x-a,t), 

where 

(a;  -  Ug-k)'^[(a;  -  Ug-k)-  -  Cg'V]  =  0. 

The  root 

/  ,  >2  2, 2 

(u;  -  Ug-k)  =  Cg  k  , 

is  the  dispersion  relation  of  sound  waves.  The  associated  null  vector  is 

'  ^0  ] 

Cgk  ,  k  =  k“^k. 

0 

Sound  waves  are  compressive,  longitudinal,  and  isentropic.  The  root 

u  =  Ug-k, 

is  a  multiple  eigenvalue  in  more  than  one  space  dimension.  Such  waves  are 
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convected  by  the  background  flow.  The  associated  null  space  is  spanned  by  vectors 
of  the  form 


P  PSO 

u  =  0 


or  k"*"  ,  where  k'‘‘-k  =  0. 
0 


These  waves  carry  either  entropy  at  constant  pressure,  or  vorticity. 


5.2  A  single  sound  wave 

We  begin  with  two  examples.  The  first  is  a  sound  wave  propagating  through  a 
stratified  fluid.  The  equations  of  motion  in  one  space  dimension  -  including 
viscosity,  heat  conduction,  and  a  gravitational  body  force  -  are: 


'  p  ' 

f 

u 

P 

o' 

r 

P 

■  0  ■ 

0 

u 

+ 

c^/p 

u 

Ps/^ 

U 

= 

-g 

+ 

.  S  . 

t 

.  0 

0 

U 

.  S  . 

X 

0 

c  J 

Here,  ^  =  j//  +  //g.  We  suppose  that  the  unperturbed  state  is  one  of  hydrostatic 
equilibrium, 

p  =  Pq(x),  u  =  0,  S  =  Sq(x), 

P  =  Po(x)  =  exp(SQ/Cy)/?Q'*', 
c“  =  Cq-(x)  =  7Po//>0’ 

where 


Pq  ”  -  “dx- 

The  scale  height  of  the  stratification  is  a  typical  value  of 

A  special  case  is  isothermal  equilibrium,  when  the  fluid  is  exponentially  stratified, 
Pq  =  p*exp(-x/H),  Pq  =  p^  exp{-.x/H),  Cq  =  (7gH)^/^. 

Here,  H  is  the  scale  height,  /?*  and  p*  are  the  density  and  pressure  at  x  =  0,  and 
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the  sound  speed  is  constant.  In  the  atmosphere,  the  sound  speed  is  not  constant  - 
it  varies  from  about  330  ms~^  at  sea  level  to  300  ms~^  at  10  km. 

To  e.xplain  when  a  sound  wave  may  be  described  using  the  weakly  nonlinear 
theory,  we  introduce  the  acoustic  Mach  number 

a  Cq 

Here.  u_„  is  the  maximum  particle  velocity  in  the  sound  wave.  We  denote  a 
rriaX 

typical  wavenumber  by  k  i.e. 


IVu 


=  0(k). 


max 

The  wave  is  small  amplitude  if  M„  <<  1.  The  cumulative  effect  of  nonlinearitv  is 

Od  *' 

important  over  propagation  distances  =  0(M^k)~^  A  more  precise  value  is 

where  L  is  the  parameter  of  nonlinearity  of  the  fluid,  defined  below  (5.4).  Then 
is  the  shock  formation  distance  for  a  plane  wave  with  maximum  slope  k. 
•N'ondimensionalizing  lengths  by  the  weakly  nonlinear  theory  describes  the 
propagation  of  waves  of  amplitude  order  e  <<  1,  and  wavenumber  of  order  over 
distances  of  the  order  one. 


.A  typical  width  of  the  N-wave  in  a  sonic  boom  is  100m  (k  =  ^  m  ^).  The 
the  strength  of  a  strong  boom  is  =  10  .  This  gives  ~  40  km.  For  a  20 
kHz  ultrasonic  wave  in  water,  at  standard  conditions,  with  strength  .M  =  10~^ 

Bi 

(which  corresponds  to  ma.ximum  overpressures  of  two  atmospheres),  one  finds  that 
::  35  m. 

The  importance  of  shear  viscosity  is  measured  by  the  acoustic  Reynolds  number. 
/^O^O 


Re  = 


/.ik 


Viscous  effects  are  important  over  propagation  distances  of  the  order 
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=  Re  k  \ 

11  6 

For  the  sonic  boom  in  air  with  k  =  m~  .  this  gives  km.  Thus 

shear  viscosity  has  negligable  influence  over  most  of  the  N-wave.  However,  it  may 
have  an  important  effect  in  spreading  out  the  shock  waves.  Also,  dissipation  due  to 
relaxation  effects,  which  we  shall  not  consider  here,  is  usually  more  important  than 
that  due  to  shear  viscosity.  For  a  20  kHz  ultrasonic  wave  in  water  ~  10  km  is 
also  much  longer  than  the  nonlinear  lengthscale.  However,  at  higher  frequencies  of 

_9 

about  100  MHz,  one  finds  that  ~  ~  10  “  m,  so  that  nonlinear  and  viscous 

effects  are  about  the  same  magnitude. 

A  balance  between  weakly  nonlinear,  viscous,  and  nonuniform  effects  occurs  in 
the  limit  0+,  with  ~  H.  This  gives 

M,  =  0(e),  small  amplitude, 

a 

kH  =  0(e*'^)  high  frequency. 

Re  =  0(e~^)  large  acoustic  Reynolds  number. 

These  effects  are  significant  over  propagation  distances  of  the  order  (ek)  ^  Special 
cases  (e.g.  an  inviscid  fluid)  may  be  obtained  from  this  expansion  by  neglecting  the 
appropriate  terms. 

We  nondimensionalize  (5.3)  by  the  scale  height  H,  and  the  density  p*  and 
sound  speed  c*  at  x  =  0.  With  the  above  scaling  assumptions,  the  nondimensional 
viscosity  and  thermal  conductivity  are  of  the  order  f“, 

9  ‘  9  * 

H  =  (“i-i.  K  =  e“K. 

The  weakly  nonlinear  expansion  for  a  sound  wave  propagating  in  the  positive 
.x-direction  is 


'  p  ' 

+  fa[e  ^(5(x,t),x,t] 

u 

= 

0 

Co(-9 

.  S  . 

■  So(-9. 

0 

k.  J 

557 


where  0  is  the  retarded  time, 

0  =  t  -  /q  CQ(xr^dx. 

The  transport  equation  for  the  wave  amplitude  is, 

(5.4)  a^.  +  CQa^  -  ^^aa^  +  (2pg)  ^(cQPQ'+3cQ'pQ)a  =  ^(5cq 

where 


6  =  H  + 

^0  ^O^p 
o 

The  quantity  e~6  is  the  "diffusivity  of  sound"  [26], 


e~6  =  +  —  +  -^j,  Pr  =  — —  =  Prandtl  number. 

p'-i  p  Pr'’  K 

The  coefficient  of  the  nonlinear  term, 


is  called  the  parameter  of  nonlinearity  of  the  fluid.  The  "1"  in  L  is  due  to 
convection  of  the  wave  by  its  own  velocity  field.  The  remaining  part  is  due  to  the 
variation  of  sound  speed  with  density.  L  =  1.2  for  air.  and  L  :::  3,5  for  water.  The 
nonlinearity  of  sound  waves  in  air  is  mainly  due  to  convection,  but  in  water  it  is 
mainly  due  to  the  dependence  of  sound  speed  on  density. 

The  inviscid  form  of  (5.4)  may  be  written  as, 

(5-5)  ~ 

9  9 

which  states  that  the  average  linearized  wave  energy  density,  p^c^^aT,  is  conserved. 
For  a  uniform  fluid,  in  which  pg  and  Cg  are  equal  to  one,  (5.4)  reduces  to  Burgers' 
equation. 


(5.6)  ^t  ^x  ~ 

For  an  exponentially  stratified  fluid,  (5.4)  gives 

(5.7)  ^t  ^x  ~  —  ^a  =  ^00" 

Here,  (5g  is  the  diffusivity  of  sound  at  x  =  0.  (We  assume  that  the  viscosities  and 
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thermal  conductivity  depend  only  on  temperature.)  If  nonlinearity  and  dissipation 
are  neglected,  the  wave  amplitude  grows  along  a  ray  like  exp(ij).  This  follows  from 
conservation  of  energy  (5.5),  since  the  density  decays  like  e~^,  and  the  sound  speed 
is  constant. 

For  our  second  example,  we  consider  spherical  sound  waves.  Geometrical 
acoustics  applies  at  distances  r  which  are  much  greater  than  a  wavelength.  The 
NGA  solution  for  an  outgoing  spherical  wave  propagating  through  a  uniform  fluid  is 

'pi  r 1 j  r  1 1 

u  =  0  +  fa[— ^,r,t|  1  +  0(c  ), 

.  s  J  [  0  J  [  0  . 

where  u  is  the  radial  velocity  component.  The  transport  equation  is 
(5.9)  a^,  +  a^  4-  A^aa^  +  ^^^3-  = 

where  n  is  the  number  of  space  dimensions  (n  =  2  for  cylindrical  waves  and  n  --  .3 
for  spherical  waves).  If  6  =  0,  this  has  the  conservative  form 
^(r  a  )  +  ^(r  a  )  +  ^(-^r  a  )  -  0. 

The  general  solution  of  the  linearized  equation  is  therefore 
a  =  aQ(r-t)r'~^^~^^/^. 

The  cross-sectional  area  of  a  cone  of  rays  at  a  distance  r,  is  proportional  to  r’^''^ 
Thus  this  formula  states  that  (amplitude“  x  ray  tube  area)  is  constant  along  a 
ray.  which  also  follows  from  conservation  of  wave  energy. 

For  a  sound  wave  propagating  through  a  nonuniform,  moving  fluid  the  XG.4 
solution  of  (5.1)  is 


'  p  ' 

^0 

+  fa[f  ^d(x,t),x,t] 

1 

u 

= 

^0 

Cq  k 

.  S  . 

0 

In  (5.10),  PQ(x,t),  UQ(x.t),  SQ(x.t)  are  smooth  solutions  of  the  nondissipative  form  of 
(5.1),  and  Cg(x.t)  is  the  corresponding  sound  speed.  The  local  frequency  and 
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wavenumber  are 


and  n  is  the  frequency  in  a  reference  frame  moving  with  the  fluid, 
n  =  a;-UQ-k  =  - 

where 


dj  =  d^  +  Ug-T. 

The  eikonal  equation  for  0  is 

2  2  ,  r,  .  I  2 

O-p  —  Cg  I  V  I  . 

The  transport  equation  for  a  is 


(o.ll) 


cLfj-  +  — .-jk'Va  +  — fl  aa 


k- 


0 


0  2 
f^T  +  CQ“divk  ^^^(PqCq  ) 

+  { - 211 - 


k-VlPgCg  ) 

?y - }a 


If  5  =  0,  (5.11)  can  be  put  in  the  conservative  form, 

Aj.  +  div[(Ug+R2k)A]  =  0, 

Ic 


-(^“a 
2CK  a^^ 


where 


A  =  /5QCQ“na“, 
is  the  wave  action. 

Chin  et  al  [4]  have  used  this  expansion  for  (5.1)  with  heat  sources,  to  study 
the  nonlinear  development  of  acoustic  instabilities. 


5.3  Diffraction 


5.3.1  Transverse  diffraction 

Transverse  version  of  the  GBE's  in  section  .5.1  are  most  easily  obtained  by  the 
following  heuristic  argument.  We  consider  plane  waves  in  two  space  dimensions  for 
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simplicity.  In  a  stationary  fluid,  with  sound  speed  one,  the  linearized  dispersion 
relation  is 


(5.12) 


9  9  9  9 

^-  =  k-  +  r  +  m^. 


In  (5.2),  k,  4  a;.d  m  are  the  x,  y  and  z  components  of  the  wavenumber  vector.  For 
waves  propagating  in  the  positive  x-direction,  with  slow’  variation  in  the  y  and 
z-directions. 


m 


F’  Y 


«  1. 


Expanding  (5.12)  gives 

(5.13)  u;  ~  k  +  (2k)~V“+m^)- 

The  transverse  version  of  (5.2)  and  (5.6),  whose  linearized  dispersion  relation  agrees 
with  (5.13)  is 

- 1  r  1 1  r  1 1 

+  0(f“), 

^  ^0  J 

where  a(^.;7,(,x.y,z,t)  satisfies 

For  an  a.xially  symmetric  beam,  this  gives 

(5.14)  5^(a^  +  a,.  -  ^aa^  -  -  ^(a^^  +  ^"^a^)  =  0, 

where 


'  p  ' 

'  1  ' 

'  1  ■ 

u 

= 

0 

1 

,  .S  . 

0 

^  J 

€  C 

.  0  . 

,  =  (,2  +  cY/2  =  r‘/2 


r(y,z),  r  =  (y^  +  z^)^/**. 


This  equation  is  not  obtained  directly  by  using  y  =  r,  which  gives  (5.14)  without 

the  term  proportional  to  p~^a.  .  This  is  because  r  is  not  smooth  at  r  =  0.  The 

U  1  /2 

two  expansions  agree  when  r  >>  e  '  . 

For  nonplanar  waves  (in  a  uniform.,  stationary  fluid  for  simplicity),  o(x.t)  solves 
the  eikonal  equation. 


o  o 

=  k-. 


=  -0,.  k  =  V<5. 
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and  f(x)  must  satisfy 

=  0. 


The  transport  equation  is 

+  u;  ^k-Va  +  - ^rjr]  " 

For  the  case  of  a  sound  wave  propagating  vertically  upwards  through  an 
exponentially  stratified  fluid  and  diffracting  horizontally,  the  transverse  equation 
corresponding  to  (5.7)  is 

^^^^t  ~  hrjr]  " 

where  rj  =  The  change  of  variables  (3.6),  with  an  additional  rescaling  to 

remove  constant  factors,  puts  this  equation  in  the  form 
J[U,  +  UU^  -  tuj  +  t-‘Uyy  =  0.  I  >  0. 

For  a  outgoing  cylindrical  wave  diffracting  in  the  angular  direction,  the  transverse 


equation  corresponding  to  (5.9),  with  n  =  2,  is 

+  a^  -f  ^aa^  +  -  ^(5a^^)  +  ^r  “a^^  =  0, 

where  rj  =  f'*^/“tan''^(^).  The  change  of  variables  (3.6)  leads  to 

Tj^fu,  +  uu  -  tu  1  +  t  ^u  =  0. 
t  X  .xx^  yy 

.•\s  shown  in  section  3.L  this  equation  is  equivalent  to  the  Kuznetsov  equation  (3.S). 


5.3.2  Caustics 

Suppose  that  a  nonplanar  sound  wave  propagates  through  a  uniform  fluid  at  rest  and 
forms  a  smooth,  convex  caustic  at  cfx)  =  0.  The  weakly  nonlinear  solution  is 

/’Iff'  f  1  ■ 

u  j  =  0  +  f“/’^a[f~^c»(x,t),e~“'^'^t'(x),x.t]  k(x)  +  0(f). 

S  J  I  0  j  [  0  . 

Here. 
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Ci>(x,t)  =  y(x)  -  t, 

and  the  sound  speed  and  the  density  of  the  unperturbed  fluid  are  one.  The  functions 
^  and  ip  are  determined  from 

|V(^-|2  +  =  1, 

and  ?L{9.r],x,t)  satisfies 

^[(Ma-7;)a^  +  a^^  =  0, 

where 

M  =  I  Vi/)|~“(7+1). 

For  example,  consider  a  circular  caustic  at  r  =  R,  where  (r,(T)  are  polar 
coorninates  in  the  plane.  Then 

^  =  Ra.  =  (r2.R2)l/2  _  Reos-l(f). 

On  r  =  R,  this  implies  that 

M  =  2‘^/^R-/^  (7+1). 

5.4  Interacting  waves 


5.4.1  Reflection  of  a  sound  wave  off  an  entropy  wave 

The  weakly  nonlinear  expansion  for  interacting  wave  solutions  of  (5. .3)  has  the 
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amplitude  a^),  and  a  sound  wave  moving  to  the  right  (with  amplitude  a^). 

The  resonant  interaction  equations  for  the  wave  amplitudes  are 
aitiO)  -  aj,^(«)  - 
-  jk,  1  im 

i-’X  "-I  J-' 

O' 

(5.15)  a.^^  =  k.y~K 

a^J^)  +  +  k3^a3(^)a3^^) 

1  —  X  ^6  1-' 

In  (5.15),  we  have  not  shown  the  dependence  of  aj  on  (x,t)  explicitly,  and  a,^'(^)  = 
d^.~,{d).  These  equations  consist  of  a  pair  of  Burgers  equations  for  the  sound  waves 
coupled  by  a  correlation  with  the  entropy  wave.  The  entropy  wave  is  determined 
independently  of  the  sound  waves  from  the  diffusion  equation  (5.15b).  Thus,  (5.15) 
describes  the  reflection  of  sound  waves  off  an  entropy  perturbation. 

We  obtain  a  simpler  version  of  (5.15)  if  we  neglect  thermo-viscous  effects 

(^  =  K  =  0),  and  assume  that  there  are  no  spatial  modulations  (aj  independent  of 

x).  Then  from  (5.15b) 

is  an  arbitrary  function  of  9,  which  we  assume  to  be  2T-periodic  in  9.  We  shall 
look  for  27r-periodic  solutions  for  {apa,^}.  The  correlations  in  (5.15)  are  only 

2T-periodic  if 

k.j  k.^ 

2§^  -  2^3  - 

where  n  and  m  are  integers.  Then,  nondimensionalizing  lengths  so  that  k.,  ==  2.  the 

wavenumbers  are 

k—  —  k  —  k  —  — 

*"1  “  n’  *^2  “  “  m’ 

and  the  corresponding  frequencies  are 
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They  satisfy  the  resonance  condition, 
k.^  =  nk^  +  mk^, 

ijjc)  =  ntJj  +  ntu;^. 

Defining  new  variables 

u(^,t)  =  k2-^^^a2(^,t), 

v(^,t)  =  -kj^^^aj^(^,t), 

K(«)  =  h2'(fl, 

in  (5.15),  gives  the  following  pair  of  integro-differential  equations  for  u(x,t)  and 
v(x.t), 

“t  +  ““x  K(mx  +  ny)v(y,t)  dy  =  0, 

(o.l6) 

^'t  ^’''x  +  ?  2^  ^0^  niy)u(y,t)  dy  =  0. 

These  equations  (with  |m|  =  |n|  =1)  have  been  studied  analytically  and 
numerically  by  Majda,  Rosales,  and  Schonbeck  [31].  For  a  sinusoidal  entropy 


distribution,  and  m  =  -n  =  1,  (5.16)  becomes 

^t  ^  ^^x  ^  “  y)v(y,t)  dy  =  0, 

(5.17)  ^ 

^'t  ^  ^'"x  ^  2^  ^0^  ■  y)u(y,t)  dy  =  0. 

Pego  [3S]  found  an  exact  travelling  wave  solution  of  (5.17),  namely 

u  =  c  +  J[1  +  acos(x-ct)]^/“, 

(o.lS) 

V  =  c  +  /J[l  +  cracos(.x-ct)]^/”, 


where  a  e  {-1,+1},  0  <  a  <  1,  and 

/3(a)  =  ^  /q^  cosy(l  +  acosy)^/\v, 
c(a)  =  -  ^(a)^  /q^  (1  +  acosy)^/^ 
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An  interesting  feature  of  this  solution  is  that  waves  exist  only  up  to  a  maximum 
amplitude,  correponding  to  a  =1.  The  limiting  wave  has  a  cusp  at  its  crest  or 
trough. 

5.4.2  Wave  induced  combustion 

The  combustion  equations  contain  source  terms  which  are  rapidly  varying  when  the 
activation  energy  is  large.  A  combination  of  weakly  nonlinear-high  frequency  and 
high  activation  energy  asymptotics  leads  to  equations  of  the  type  described  in  section 

4.2  [30].  The  mean  field  equations  are 

-  -  -1  T 

aft  -  a^^  =  (27)  e  <  exp(7-l)a  >, 

a.2j  =  <  exp(7-l)a  >, 

-  -  -1  T 

^3t  ^  ^3x  ^  ®  exp{7-l)a  >, 

and  the  equation  for  the  wave  is 

a^  -  a,^  -  ^[(7+l)a|  +  ('}^l)a,^  +  (7-3)a2]a^  — ^^^aa^ 

-1  T 

=  (27)  e  [exp(7-l)a  --  <  e.xp(7-l)a  >]. 

Here.  fa^(x,t),  ea9(x,t),  ea^(x,t)  are  the  mean  field  perturbations  in  the  left-moving 
sound  wave,  the  entropy  wave,  and  the  right-moving  sound  wave,  and 
T  =  (7-i)(c^  +  a.,  +  a^). 

The  amplitude  of  the  left-moving  high  frequency  sound  wave  is  ea(^-^,x,t),  where 

a(^.x.t)  is  a  zero-mean  almost  periodic  function  of  9,  and 

1  T 

<  e-xp(7-l)a  >  (x,t)  =  1  im  Y  /q  exp[(7-l)a(^.x,t)]  d^. 

T-’x 

The  solution  for  the  mean  field  blows  up  in  finite  time.  The  blow-up  time  is 
shorter  when  a  ^  0  than  when  a  =  0.  This  describes  one  way  that  a  sound  wave 
could  enhance  the  detonatation  of  a  reacting  gas. 
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Phase- Change  Problem  for  Hyperbolic 
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Abstract 

Phase-change  problem  is  discussed  for  a  hyperbolic  heat  transfer 
model  under  the  traditional  assumption  that  the  temperatures  on  two 
sides  of  the  interface  are  equal  and  given.  The  sufficient  and  necessary 
conditions  are  given  for  the  local  solution  to  exist  and  be  unique. 
Global  existence  is  discussed  for  some  special  case. 


1  Introduction 

As  is  well-known,  the  classical  mathematical  model  for  the  heat  transfer  and 
diffusion  phenomena  is  of  parabolic  type,  in  particular,  the  heat  equation. 
These  models  are  based  upon  the  Fourier’s  law  of  heat  conduction: 

q  =  -kVu,  (1.1) 

where  ^is  the  heat  flux  vector,  k  the  thermal  conductivity,  u  the  temperature. 
In  most  cases,  this  kind  of  model  works  pretty  well  and  gives  satisfactory 
results.  But  one  inherent  shortcoming  of  the  parabolic  model  is  that  it  implies 
a  physically  unacceptable  infinite  propagation  speed.  This  might  be  very 
important  in  certain  models  with  large  variations  in  temperature  or  large 
gradients  of  temperature. 
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In  order  to  avoid  this  difficulty,  it  has  long  been  proposed  that  instead  of 
the  Fourier  law,  one  should  assume  that  the  heat  flux  responds  to  tempera¬ 
ture  gradient  after  a  delay  period  of  r  >  0  :  i.e. 

+  t)  = -kVT(i).  (1.2) 

Taking  first  order  approximation  one  has  the  following  relaxation  relation: 

rq'it)  +  ^t)  =  -kVT{t).  (1.3) 

Replacing  the  Fourier  law  with  this  relation  and  combining  with  the  law  of 
conservation  of  energy: 

cput  +  V-q  =  0,  (1.4) 

with  p  being  the  density,  c  the  specific  heat,  one  has  the  hyperbolic  telegra¬ 
pher’s  equation 

TUtt  +  Ut  —  =  0.  (1.5) 

Here,  =  k/ pc  is  the  diffusivity. 

In  particular,  for  r  =  0,  we  get  the  classical  heat  equation.  There  are 
already  a  lot  of  works  about  the  relation  between  the  classical  heat  equation 
and  the  telegrapher’s  equation.  For  the  Cauchy  problem  or  initial-boundary 
value  problem,  the  solution  of  telegrapher’s  equation  tends  uniformly  to  the 
solution  of  classical  heat  equation  as  t  — ♦  0. 

On  the  other  hand  ,  for  the  classical  heat  equation,  an  interesting  and 
important  problem  both  in  theoretical  research  and  application  is  the  famous 
Stefan  problem.  The  Stefan  problem  consists  of  finding  not  only  the  temper¬ 
ature  distribution  u(x,i),  but  also  the  surface  along  which  a  phase  change 
occurs.  It  is  only  natural  that  one  should  study  the  problems  of  Stefan  type 
for  the  hyperbolic  heat  transfer  model. 

In  [5],  Solomon  and  others  gave  a  formulation  of  hyperbolic  Stefan  prob¬ 
lem  based  upon  the  traditional  assumption  that  the  temperatures  on  two 
sides  of  phase  change  boundary  are  given  and  equal.  Also  in  their  paper,  an 
explicit  solution  was  given  where  the  phase  change  front  propagates  faster 
than  sound  speed  and  consequently  is  physically  unacceptable. 

Partly  in  order  to  avoid  this  difficulty,  several  authors  [l][2][4][6]  sug¬ 
gested  other  formulation  of  phase  change  condition  based  upon  the  Rankine- 
Hugoniot  conditions  for  the  hyperbolic  conservation  laws.  In  particular,  in 
this  type  of  formulations,  the  temperature  across  the  phase-change  surface 
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is  discontinuous.  And  in  all  these  papers,  the  original  question  of  Solomon 
and  others  in  [5]  remained  unanswered. 

In  this  paper,  we  want  to  study  the  Stefan  problem  for  the  hyperbolic 
heat  model  in  the  classical  framework  where  the  temperature  is  assumed 
to  be  given  and  continuous  across  the  phase-change  surface.  The  sufficient 
and  necessary  conditions  are  given  for  the  local  existence  and  uniqueness. 
In  particular,  for  the  example  in  [5],  a  natural  mathematical  explanation 
is  given.  Also,  the  global  solution  is  discussed  for  some  examples.  But 
the  conditions  to  guarantee  the  global  existence  are  only  sufficient  ones.  It 
remains  open  as  to  what  extent  these  conditions  could  be  relaxed.  And  also, 
we  treat  here  only  the  case  of  one  space  dimension.  For  the  high  dimensional 
case,  the  only  result  now  available  is  in  [4],  where  a  weak  solution  was  given. 
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3.  Convective  boundary  condition 

q{xQ,  t)  =  h[u^{t)  -  u(xo,  t)],  t>  0.  (2.5) 

If  Xq  =  0,  then  no  initial  condition  is  needed.  If  xo  <  0,  initial  conditions 
should  be  given: 

u(x,0)  =  uo(a;),  9(a:,0)  =  qo{x),  xq  <  x  <  0.  (2.6) 

Now  for  the  case  of  Xq  <  0,  we  have  the  following  result: 

Theorem  2.1  If  uo,qQ  6  C^(— oo,0],  and  satisfy  the  corresponding  compat¬ 
ible  condition  at  x  =  0,t  =  0.  Then  the  problem  (2.1)(2.2)  coupled  with 
any  one  of  the  boundary  conditions  in  (2.3)-(2.5)  has  a  unique  local  solution 

ko(0)|  <  pHa{T)-2.  (2,7) 

The  idea  of  the  proof  is  to  introduce  the  new  variables 

X  =  X  —  <f>{t),  t  =  t  (2.8) 

to  transform  the  original  free  boundary  problem  into  a  fixed  boundary  prob¬ 
lem,  and  then  solve  the  transformed  problem  by  integration  along  character¬ 
istics  and  linear  iteration. 

The  condition  (2.7)  is  also  necessary  in  the  sense  that  if  it  is  not  true, 
one  would  have  either  non-uniqueness  or  non-existence  of  the  solution. 

In  particular.  Theorem  2.1  explains  the  physically  unacceptable  explicit 
solution  example  given  in  [5]  where  (2.7)  is  not  satisfied  and  consequently 
the  solution  is  not  unique  in  that  c«ise. 

If  Xo  =  0,  then  the  situation  is  a  little  different  from  the  previous  case: 

Theorem  2.2  If  UQ,qo  6  C'*(— oo,0],  and  satisfy  the  corresponding  compat¬ 
ible  condition  at  x  =  0,t  =  0.  Then  the  problem  (2.1) (2.2)  coupled  with 
any  one  of  the  boundary  conditions  in  (2.4)(2.5)  has  a  uniaue  local  solution 
{u,q,6)eC^  xC^  xCUf 

0  <  go(0)  <  pHa{T)~i.  (2.9) 

The  problem  (2.1)(2.2)(2.3)  is  not  well-posed. 
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The  proof  of  this  theorem  is  again  achieved  by  integration  along  character¬ 
istics  and  linear  iteration.  In  doing  so,  we  make  use  of  the  theorem  of  Zhao 
about  the  well-posedness  of  boundary  value  problems  for  hyperbolic  system 
in  corner  domain  in  [7]  which  extends  the  result  of  [3]. 

The  non-well-posedness  of  the  problem  (2.1)(2.2)(2.3)  follows  from  the 
fact  that  9(0,0)  is  not  uniquely  determined.  Consequently  the  solution  is 
not  unique. 


2.2  Two  phase  problem 

Similar  to  the  one  phcise  case,  the  two  phaise  Stefan  problem  can  be  formu¬ 
lated  as  follows; 


/  Tidtqi  -f  kid^ui  =  0,  ,  . 

lc.Mu,  +  a.«.  =  o. 

{2.10) 

/  1-23(92  -i-  k2d^U2  =  0, 

(2.11) 

j  Ui{x,t)  =  U2ix,t)-0,  .... 

1  pHm  =  (9i  -  ftXi,  <)■ 

(2.12) 

< 

f  Ui(a:,0)  =  uio(a:),  9i(3^>  0)  =  910(1),  x  <  0, 
U2(x,0)  =  U2o(a:),  92(®>  0)  =  92o(x),  x  >  0, 

1  m  =  0. 

(2.13) 

Theorem  2.3  uio?  9105  ^^20?  920  €  C^,  and  satisfy  the  corresponding  com¬ 
patible  condition  at  x  =  0,t  =  0.  Then  the  problem  (2.1 1)-(2.13)  has  a 
unique  local  solution  (ui,9i,U2>92>^)j  */ 

1(920  -  9l0)(0)l  <  min{pH{alfTi)^,pH{a]lT2)^. 

(2.14) 

The  proof  of  this  theorem  is  similar  to  the  proof  of  theorem  2.1. 


3  Global  solution 

For  the  global  solution  of  the  hyperbolic  Stefan  problems  discussed  in  section 
2,  we’ll  consider  only  some  special  case. 

For  the  one  phase  problem  with  the  imposed  temperature  condition  on 
the  fixed  boundary  (2.1)-(2.3),(2.6),  we  have 
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Theorem  3.1  If  uo,qo,u^  are  sufficiently  smooth  and  corresponding  com¬ 
patible  conditions  are  satisfied  at  (0,0),(xo,0).  If  in  addition, 

v!^{i)  >  0, 

1*0  ±  kar^qQ  <  0, 

0  <  ^(uo(0)  +  fcar^o(O))  <  1. 

Then  the  problem  (8.1)-(8.3),(2.6)  has  a  unique  global  smooth  solution  («,?,  ^). 

For  the  proof  of  this  global  result,  we  follow  the  approach  of  J.  Greenberg  in 
[2].  As  usual,  by  linear  transformations  of  the  independent  vauriables  and  the 
unknown  functions,  we  can  reduced  the  problem  into  the  diagonal  form.  The 
global  existence  is  proven  if  we  can  show  that  the  free  boundary  will  remain 
uniformly  noncharacteristic  for  all  times.  This  in  turn  can  be  achieved  by 
the  monotoneity  argument  similar  to  the  one  employed  in  [2]. 

Very  similarly,  the  global  existence  of  the  two-pha.se  Stefan  problem  can 
be  stated  and  proven  if  the  same  kind  of  monotoneity  of  the  initial  data  is 
assumed  md  the  relaxation  time  r  and  the  difFusivity  a  in  two  phaises  are 
assumed  to  be  equal. 
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Abstract 

Recently,  it  has  become  possible  to  model  the  progress  of  fires  in  large 
structures,  such  as  buildings,  with  considerable  accuracy.  However,  msuiy  of 
the  physiced  processes  involved  in  large  fires,  such  as  the  rate  of  spread  of 
the  fire  and  the  balamce  between  convective  and  radiative  heating  and  cooling 
effects,  are  difficult  to  model  accurately  from  first  principles. 

In  this  work,  an  improved  approximation  for  the  flux  onto  burning  objects 
from  flames  is  derived  2md  used  to  obtain  an  expression  describing  the  rate 
of  fire  spread.  The  new  formulations  developed  in  this  work  are  incorporated 
into  a  compartment  fire  model.  Some  results  from  a  numerical  solution  of  the 
equations  governing  fires  for  a  specific  case  are  presented. 


1  Introduction 

Fueled  by  recent  events  such  as  the  fire  in  the  King’s  Cross  underground  station, 
there  has  been  considerable  interest  in  modelling  the  spread  of  fires  in  the  interiors 
of  large  structures  such  as  buildings  and  large  compartments.  In  general,  there 
are  two  types  of  fire  models:  stochastic  and  deterministic.  An  example  of  the 
first  type  is  the  model  originally  developed  at  Worchester  Polytechnic  Institute  to 
model  the  spread  of  fire  in  buildings,  and  subsequently  substantially  modified  at 
DREV  to  model  the  spread  of  fires  on  board  ships  (Fitzgerald  1984).  This  model 
is  a  stochastic  one,  and  hence  one  must  specify  the  probabilities  of  thermal  zoid 
structural  failures  of  the  walls,  the  probability  of  self-extinction  of  a  fire  within  a 
given  compartment,  and  the  probability  of  success  of  attempts  to  extinguish  the 
fire. 

The  other  approach  is  a  deterministic  one  in  which  the  initial  conditions  such 
as  fuel  load,  ignition  temperatures,  ventilation  parameters,  and  dimensions  of  the 
compartment  are  specified,  with  the  progress  of  the  fire  being  modelled  by  simu¬ 
lating  the  physics  of  the  fire.  The  deterministic  models  are  further  subdivided  into 
two  subclasses:  computational  fluid  dynamics  models  and  zone  models.  The  CFD 
models  divide  the  volume  of  interest,  such  as  the  interior  of  a  compartment,  into 
several  thousand  cells,  and  calculate  the  conditions  within  each  cell  based  on  e.g. 
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the  gas  species  and  temperatures,  incident  and  emitted  radiation,  and  air  movement 
within  each  cell.  This  method  has  the  disadvantage  of  being  very  computationally 
Litensive,  requiring  the  use  of  a  supercomputer. 

The  second  type  of  deterministic  fire  model  is  the  zone  model.  In  this  class  of 
models,  the  volume  of  interest  is  divided  into  two  or  more  zones,  and  the  model 
calculates  average  conditions  within  each  zone.  The  most  usual  choice  of  zones  is 
one  in  which  the  layer  of  hot  gases  which  tend  to  accumulate  near  the  ceiling  and 
the  lower  layer  of  relatively  cool  gases  are  represented.  CFG  V,  the  fifth  Harvard 
Computer  Fire  Code  (Mitler  1985,  Mitler  and  Emmons  1981),  is  a  two  zone  model 
originally  written  to  be  used  in  the  prediction  of  the  spread  of  fires  within  buildings. 
Even  in  these  relatively  simple  forms,  the  physical  processes  involved  in  burning 
require  large  numbers  of  calculations  with  all  the  problems  attendant  upon  the 
solution  of  non-linear  systems  of  equations. 

In  this  paper,  the  methods  used  in  CFC  V  to  calculate  the  radiant  flux  from 
flames,  flzime  spread  and  temperatures  for  objects  are  described.  The  modification 
to  the  algorithms  for  the  calculation  of  the  radiant  flux  are  described,  along  with 
the  adaptation  of  Quintiere’s  (1981)  model  of  flame  spread  to  a  form  suitable  for 
incorporation  into  CFC  are  described,  and  numerical  examples  are  presented. 


2  Calculation  of  Heat  Flux,  Temperature  and  Flame 
Spread 

2.1  Review  of  Previous  Work 


In  CFC  V,  a  flame  is  modelled  by  a  cone  of  grey  gas  at  a  temperature  of  1260°  K, 
with  radius  r/,  height  h/,  and  semi-apex  angle  V’-  The  power  per  steradian  per  unit 
volume,  d^Q,  from  an  emitting  element  dV  is  given  by 


dV 


d^Q  =  9—, 
47r 


(1) 


(2) 


where  g  is  given  by 

g  =  AkgTj, 

and  K  is  the  absorptivity  of  the  flame  gas  in  m“^,  Tf  the  flame  temperature  in  °K 
and  a  =  5.67  x  10~®  W/m^  °K^  the  Steffan- Boltzmann  constant.  The  flux  per  unit 
volume  per  steradian  at  a  point  P  a  distance  p  from  the  emitting  element  dV  is 
given  by 

d  $  =  - - rP, 

47r  p* 

as  shown  in  Fig.  1.  The  flux  normal  to  the  surface,  is  given  by 


(3) 


.3*  9  dV  ^  . 

dr9n  =  - ^P  •  n. 

47r  p^ 


(4) 
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As  can  be  seen  from  Fig.  1,  for  an  element  of  volume  located  on  a  disc  at  a  height 
X  above  the  surface  containing  the  point  P  and  at  a  radial  coordinate  2, 


ri'p  — 


X 


P 


(5) 


and 

+  2^  —  252  cos  <i>\  (6) 

where  s  =  (x*  +  L^)  a  is  the  distance  from  the  centre  of  the  disc  at  x  to  P,  L  is  the 
distance  from  the  centre  of  the  flame  base  to  P,  and  (j>'  is  the  angle  between  a  and 
2.  Again  referring  to  Fig.  1,  if  ^  is  the  usual  azimuthal  coordinate,  and  6'  is  the 
angle  between  a  and  L, 


z  =  z  cos  <^z-\-  z  sin  <i>  y, 
a  =  a  cos  6'  z  —  a  sin  6'  y. 


(7) 


Hence, 


cos  = 


a  •  2 


=  cos  6'  cos  <j) 

=  —  cos  <^. 
a 


(8) 


Substituting  eqs.  (5),  (6)  and  (8)  into  eq.  (4)  and  using  dV  —  z  dxdz  d(f>,  one  obtains 


X2 


47r  +  2^  +  L*  —  2Lz  cos  (i>) » 


dx  dz  d<f> 


(9) 


(Mitler  1978). 

The  normal  flux  from  the  whole  flame  at  P,  is  then  given  by 


q  /■*»  /■*(*)  /■2»- 

=  f-  /  /  / 

47r  y*.  yo  Jo 


xz 


(x*  +  2*  +  L*  —  2Lz  cos  <i>)  3 


d<f>  dz  dx. 


where  2{x),  the  width  of  the  cone  as  a  function  of  height  x,  is  given  by 

2{x)  =  r/  +  (xa  -  x)  tan  0. 


(10) 


(11) 


Usually,  Xa  =  0  and  xj  =  h/;  however,  if  the  cone  of  flame  extends  into  the  layer  of 
smoke  at  a  height  X{  above  the  flrebase,  it  is  (perhaps  unrealistically)  assumed  that 
that  portion  of  the  flzune  within  the  layer  produces  a  negligible  contribution  to  the 
flux  at  P.  Under  those  circumstances,  xj  =  h/  —  xj.  If  P  is  not  on  the  same  level  as 
the  fire  bsise,  Xa  will  tzdce  a  non-zero  value  which  is  a  function  of  the  line  of  sight  to 
P. 

Evidently,  the  evaluation  of  eq.  (10)  by  analytic  methods  is  not  tractable.  In 
CFG  V,  eq.  (10)  is  integrated  by  the  expedient  of  setting  cos  4>  to  its  average  value 
of  cos  <f>  =  0. 
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Substituting  cos0  =  cos*^  =  0  into  eq.  (10),  and  integrating,  one  obtains  that 


=  f  ((i *  +  ij)  *  -  (-t*  +  -  «•) 


(12) 


+6  sin  0  COS*  V’ In 


Sa  +  X, 


+ 


taCOsV’  —  rfsinrp  \ 
X6  sec  0  —  6  sin  0  .  /  ’ 


where 


6  =  r/  +  Xatan0, 

5a  =  (L*+x*+rJ)a, 

Sft  =  (L*  +  X*  +  [r/  -  (xfc  -  Xa)  tan  0]*)  a 


(13) 


(Mitler  1978).  The  error  introduced  by  setting  cos  0  =  0  in  eq.  (10)  is  compensated 
for  by  multiplying  eq.  (12)  by 


nd)  = 


'  O.SOeSdP,  d  <  1 

,  .  0.3787  ^ 

^■^0^349^’  ®  -  ^ 


(14) 


where  d  =  —  and  p  =  2.825  (Mitler  1978).  For  an  optically  thick  flame,  the 
modified  normal  fltxx,  is  given  by 


K = m 


1  -  e' 


where  the  eff'ective  optical  depth,  r,  through  the  source  is  given  by 

r  =  -/Gr/(1  +  0.84f*), 

TT 

Tf 

where  f  =  ^  1- 

For  the  case  in  which  P  lies  within  another  fire,  is  given  by 


(15) 


(16) 


=  m 


1  —  c‘ 


.-’•i 


(17) 


where  ti  =  and  ki  and  r/^  are  the  absorptivity  and  radius,  respectively,  of 

the  flame  at  P.  For  the  case  of  a  flame  radiating  to  its  own  base,  an  average  value 
for  is  used  where 


=  trrM  1  -  e 


-0.7755 


(tT* 


(18) 


In  CFG  V,  the  fire  spread  is  modelled  by  a  semi-empirical  formulation  which 
uses  an  expression  for  $,the  average  normal  flux  to  the  base  of  the  fire,  given  by 


(19) 
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where  A  is  an  experimentally  determined  fire  spread  parameter,  B  = 

oT, 

Tf  =  By  inverting  eq.  (19),  one  obtains 


f/  =  —  Aln(l  — 


(20) 


A  u  y 


(' 


A  value  for  r/(i)  at  each  time  step  of  the  integration  can  then  be  obtained  from 

rto+At 


fto+a.t 

rf{tQ  +  A«)  =  r/(to)  +  j  Tfdt. 


(21) 


There  are  several  problems  associated  with  this  method  of  determining  rj.  The 
principal  difficulty  is  that  eq.  (19)  was  derived  under  an  implicit  assumption  that 
f/  ftj  .Olr/.  This  assumption  derives  from  a  series  of  tests  on  polyurethame  foam, 
and  there  is  no  particulair  guarantee  that  it  is  applicable  for  all  circumstances  or  for 
other  materials. 

2.2  Evaluation  of  Flux  and  Fire  Spread 

It  is  possible  partially  to  evaluate  eq.  (10)  exactly  by  rearranging  the  order  of  the 
integrations,thusly: 


2r  fXi  ct{x) 


= f  f  ri 

Jo  J*.  Jo 


xz 


*.  ■'0  (x*  +  +  L*  —  2Lzcos  d>)^ 


dz  dx  d<f>. 


(22) 


Integrating  eq.  (22)  with  respect  to  z  yields 


where 

$3 

$4 


2r  rxi, 


=  -  if 

47r  Jo  Jx 


i=l 


xL  COS  <i>{z{x)  —  Lcos<f>) 


(23) 


'X.  (x^  +  sin*  0)(x*  +  x*(x)  +  L*  —  2Lz{x)  cos  0)  a 

-f  rr - ^ 

4n  Jo  Jx,  (x^  +  x*(x)  +  L*  —  2Lx(x)  cos<^)5 
r**  xL*cos*(^ 


Ydx  d(t>,  (24) 


-ft 

47r  Jo  Jx 

a 


'X,  (x*  +  L*)^x*  +  L^sin*  ^) 

g  /•*» 

47r 


dx  d<f>, 


I  dx  d(^* 


*•  (x*+L*)5 
Now,  $4  can  be  completely  integrated,  thusly: 

«.  =  |((ij  +  l,=)i-(x;  +  t=)!). 


(25) 

(26) 

(27) 

(28) 
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The  integrals  for  $i,  $2  and  $3  can  be  integrated  analytically  only  with  respect  to 
X.  $3  becomes 

g  L  , 

=  “Sl/o 

In  sin*  [{y'2  |sin<^l)x((cos^  <i>x*  +  2L*(— sin^  0  +  2 sin*  ^  +  l)x* 

+L^(sin*  <f>  +  1)*)  a  —  cos*  0  X*  —  i/*(sin*  ^  +  1))  5 

+{v'2i/((cos^  <f>x*  +  2i/*(—  sin*  <f>  +  2sin*  <i>  +  l)x*  (29) 

+L*(sin*  <t>  +  1)*)  5  +  cos*  ^  X*  +  L*(sin*  <^  +  1))  ^ 

+(cos* <l>x*  +  2X*{— sin* (f>  +  2sin* ^  +  l)x*  +  L*(sin* ^  +  1)*)^  +  sin* <^x*  +  d4>. 

In  order  to  proceed  further  with  integration  of  $1  and  $2  it  is  assumed  that  2(x)  is 
always  an  expression  of  the  form 


z{x)  =  a  - bx, 

Substituting  eq.  (30)  into  eq.  (25)  one  obtains 

$* = -  f-  rr  "" 

4ir  Jo  Jz 


(30) 


*•  ((1  +  6*)x*  +  26(L  cos  ^  —  a)x  +  L*  +  a*  —  2La  cos  (j>)  5 
Integrating  eq.  (31)  with  respect  to  x  yields 


^  ihx  . 
(31) 


= 


_  9  f^*  ((1  +  6*)x*  +  2b{L  cos  <f>  —  o)x  +  L*  +  a*  —  2La  cos  <f>)  a 

Jo 


47r 


(1  +  62) 


(32) 


b{L  cos  <f>  —  a) 

la 


ln(2-y/(l  +  6*)((1  +  6*)x*  +  26(Lcos0  —  a)x  +  Z,*  +  a*  —  2Locos^)a 


(1  +  62) 

+2(1  +  6*)x  +  26(Lcos0  —  a))|^*  d<l>. 

The  substitution  of  eq.  (30)  into  eq.  (24)  and  the  expression  of  the  result  in  terms 
of  partial  fractions  produces 

=  ^11  +  ^i2>  (33) 

where 


2ir  I'Zi 


L cos  <I>{L  c<»  <f>  —  a)x  —  bL^  cos  ^  sin*  <f> 


(x*  +  Zf2sin*  ^)((1  +  62)x2  +  2b{Lcos<f>  —  a)x  +  -  2Lacos<i>)» 

^2«-  6Lcos^ 


(34) 

“  ctsc  ^ 


((1  +  62)x2  +  2b{L  cos  4>  —  a)x  +  —  2La  cos  <i>)  * 


r  dx  d(i>. 


(35) 
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Equation  (35)  can  be  integrated  directly,  resulting  in 


^12 


g  bL  cos  <f> 
4n  Jo  (l  +  62j5 


(36) 


ln(2Y^(l  +  6*)((1  +  6^)x*  +  2b[Lcos<f>  —  a)x  +  —  2Lacos<^)5 

+2(1  +  6*)x  +  26(Lcos^  —  a))|  *  d4>. 


The  integration  of  requires  some  additional  effort.  Using  the  substitutions 
suggested  by  Gradshteyn  and  Ryzhik  (1980,  pgs.  80  -81),  one  finds  that 


11 


= 

47r  Jo 


L  cos  <f> 


(37) 


In 


((l  +  6^)x*  +  26(Lcos0  —  a)x  +  —  2Lacos  (^)a  —  bx  —  Lcos  <)>  +  a)) 


((1  +  62)x2  -I-  2b{L cos  <f)  —  a)x  +  —  2La  cos  <^)  a  +  bx  +  L  cos  4>  —  a)) 


By  substituting  eqs.  (28),  (29),  (32),  (36),  and  (37)  into  eq.  (23),  one  can  integrate 
the  resulting  expression  for  numerically  with  respect  to  (f>  and  so  obtain  an  exact 
expression  for  the  flux  at  the  point  L. 

There  are  four  special  cases  which  must  be  dealt  with  individually:  that  for 
which  L  =  0  and  those  for  which  x  =  0  while  <f>  =  0,ir,  or  27r. 

When  L  =  0,  is  given  by  : 


= 


■y/(l  +  fe^)((l  +  b^)xl  -  2baxi,  +  0^)3  +  (1  +  6*)x6  -  26a)) 


®(\/(l  +  ^*)  “  ^) 


((1  +  6^)xj  —  2baxi  +  a^)5  —  a 

(TT"^) 


(38) 


The  integrands  of  eqs.  (29),  (32)-(37)  are  undefined  when  x  =  0  at  the  same 
time  as  0  =  0,  TT,  27r.  When  this  2Lrises  during  the  numerical  integration  of  these 
equations  with  respect  to  <b,  the  limit  of  I^f_o  $,•  as  x  — >  0  must  be  taken.  Explicitly, 


t  =  l 


=  ^  (  Lln(  ^  ) 

0  =  0 

47r  \  \L  -  a) 

L-a 


^  =  2ir 


and 


limE*, 

t=l 


(6*  + 


(62  +  1) 

+ 1)*)|  7 


Zf  +  a 


+  1) 


(62  +  l)f 


(6(6*L  -  a)  ln((6*  +  1)3  -  b)] 


(39) 


(40) 


d(f). 
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Finally,  when  the  field  point  L  is  very  much  greater  thaji  r/,  eq.  (22)  can  be 
expressed  cis 


CC'C  “  ^  ((5  cos*  «  -  l)c*  -  I*)  +  ...  j  rfx 

(41) 


"  47rL3 

The  integration  of  eq.  (41)  yields 
= 


g 

x2 

2abx 

"  4) 

413 

.  U 

3 

X* 

f3b*x* 

b^x* 

9ai^x* 

[  16 

4 

10 

27a^b*x^ 

3a*  X* 

a^bx 

(42) 


+ 


16 


+ 


16 


Once  has  been  obtained,  either  numerically  or  by  meeins  of  direct  integration  if 
possible,  may  be  found  by  the  application  of  eqs.  (16)-(17)  with  f{d)  =  1,  since 
there  is  no  error  arising  from  the  use  of  an  average  value  for  cos  d>  in  the  calculations 
described  above.  For  the  case  of  L  =  0,  r  =  .2062994  Khf  which  corresponds  to 
the  average  optical  depth  at  the  height  at  which  the  volume  of  the  cone  has  been 
divided  into  two  equal  parts. 

In  order  to  obtain  the  total  normal  flux,  q'\  at  a  point  P,  one  must  sum  not  only 
the  contributions  to  the  flux  from  all  flames  visible  at  that  point,  but  as  well  those 
from  the  walls,  ceiling,  and  hot  layer. 

Once  the  flux  g"  is  known,  the  surface  temperature  distribution  across  any  object 
may  be  calculated  using  the  one  dimensional  heat  conduction  equation. 


dT  _  d^T 
dt 


(43) 


under  the  condition  that 


=  q"{Xa,  2,  At)  -  h{T{Xa,  2,  At)  -  T{Xa,  2,  0))  (44) 

(Quintiere  1981).  It  should  be  noted  at  this  point  that  the  coordinate  system  of 
eqs.  (43)-(44)  is  that  of  Fig.  1,  and  hence  x  is  the  vertical  coordinate  and  z  the 
horizontal. 

The  solution  to  eq.(43)  is  given  by 


q"{Xa,Z,s) 


r(x.,x,A,)-r(x.,x,o)  =  («) 

—  ^  ®xp  (a(At  —  s))  erfcy^o(At  —  5)^5 
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where  p  is  now  the  density  of  the  object,  h  the  heat  transfer  coefficient,  k  the  thermal 

k 

conductivity,  c  the  specific  heat  capacity,  a  =  —  the  thermal  diffusivity,  and 


If  At  =  t  —  to  is  sufficiently  small,  one  can  write 


(46) 


q"{x^,z,t)  =  q“(xa,z,to)  +  ^ 


At 


t=to 


(47) 


Setting  to  =  0  in  eq.  (47)  and  substituting  the  result  into  eq,  (45),  one  obtains  aji 
expression  for  the  change,  AT,  in  the  temperature  T  across  the  surface  of  aji  object 
during  a  time  interval  At: 


AT  = 


(^a> 

».o)  f 

dq" 

( 

dt 

.  J 

<0=0  V 

4y/d  (At)’ 

erf(\/aAt)e°'^*  2\/aAt  2\/aAt 

h  h  ^  y/ivk  y/irh 


erf(v^aAt)c 
ah 


aAt 


.aAt 


ah 


+ 


Ay/a  (At)  * 
Zy/Hk 


Zy/Hh 


M  J_ 

h  y/¥ah  ah 


(48) 


Using  eq.  (48),  one  can  evaluate  the  temperature  across  the  surface  of  an  object 
at  as  many  points  as  desired  for  each  value  of  At.  If  one  knows  the  ignition  tem¬ 
perature  Ti,  for  an  object,  it  is  possible  to  calculate,  using  eq.  (48),  ,  at  what  time 
the  temperature  at  each  point  exceeds  Ti„  and  so  determine  the  present  radius  of 
the  flame. 


3  Numerical  Methods 

The  basic  equations  governing  the  spread  of  flres  used  in  CFG  V  have  been  doc¬ 
umented  in  several  places  e.g.  Mitler  (1978),  Mitler  zmd  Emmons  (1981).  They 
constitute  a  set  of  coupled  linear  and  non-lineau”  simultaneous  algebraic  equations, 
linear  ordinary  differential  equations  with  respect  to  time,  and  one  partial  differen¬ 
tial  equation,  that  for  the  diffusion  of  heat  into  a  solid. 

Two  methods  of  solution  au’e  used  in  CFG  V:  a  successive  substitution  method 
and  a  Newton-Raphson  method  for  use  when  successive  substitution  fails.  The 
convergence  criterion  for  the  equations  is  that  the  scaled  difference  between  suc¬ 
cessive  iterations  of  the  system  of  equations  be  less  than  a  predetermined  value  e. 
These  methods  of  solution  have  not  been  altered  in  the  modifled  version  of  GFG  V 
described  above.  Presently,  c  has  been  chosen  to  be  1  x  10~^. 

GFG  V  has  been  modifled  to  incorporate  the  changes  described  in  section  2.2. 
The  original  expression  for  the  flux  eq.  (12)  was  replaced  in  the  computer 
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programme  by  the  sum  of  eqs.  (28),  (29), (32),  (36),  and  (37)  or  by  eq.  (38)  for 
L  =  0  OT  eq.  (42)  for  L  >  lOrf  . 

The  integrals  with  respect  to  0  were  evaluated  using  the  composite  trapezoidal 
rule  and  Richardson  extrapolation.  That  is,  two  estimates,  In,  and  4,  for  E?=i 
were  obtained  using  subintervals  of  ~  and  ^  ,  respectively,  were  obtained.  The 
final  estimate,  /*,  for  the  integral  was  found  by  means  of 


r  = 


_  r  ,  Int  In, 

—  i" 


1  - 


!LL 
1^2  J 


(49) 


In  this  case,  rii  =  32  and  =  64. 

dt  tQ=0 


In  eq.  (48), 


was  approximated  by  — ^ 


The  surface  tem¬ 


perature  of  combustible  objects  was  calculated  by  means  of  eq.  (48).  The  rate  of 
flame  spread,  f/,  was  estimated  from  the  calculation  of  the  temperatures  at  a  series 
of  points  along  the  burning  object.  If  2,-  is  the  closest  exterior  point  to  the  flame 
radiu.s  at  which  the  temperature  has  been  calculated,  then. 


r/ 


Zj  -  r/(to)  AT 
T{zi,to  +  At)  -  Tig  At  ■ 


(50) 


Equation  (21)  was  then  used  to  determine  r/. 


4  Numerical  Results  and  Discussion 

For  the  purposes  of  example,  it  was  decided  to  simulate  a  fire  in  a  compartment 
with  aluminium  walls  and  ceilings,  with  dimensions  of  9.14  x  5.8  x  2.4  m.  and  which 
contained  two  combustible  objects.  There  were  two  vents,  one  near  the  ceiling  and 
the  other  near  the  floor,  and  a  door  with  dimensions  1.8  x  .69  m.  Air  weis  allowed  to 
circulate  freely  through  all  three  openings.  Since  most  common  items  of  furniture 
are  composed  largely  of  polyurethane  foam,  it  was  decided  to  treat  the  two  objects 
as  parallelepipeds  made  entirely  of  polyurethane.  The  dimensions  of  the  two  objects 
were  chosen  to  be  4.5  x  .72  x  1.73  and  2.06  x  .58  x  1.73  m.  They  were  located  along 
adjacent  walls  with  the  separation  between  their  centres  being  2.73  m.  A  fire,  whose 
initial  radius  was  set  to  .037  m.,  was  assumed  to  have  been  started  at  the  centre  of 
the  larger  object  (hereafter  called  object  1)  at  f  =  0.  Since  both  objects  possessed 
the  same  height,  Xa  =  0,  X6  =  hf  and  hence  a  =  rj  and  b  =  tanV'  in  eqs.  (30)  -  (42). 
Following  Mitler’s  (1978)  work,  it  was  decided  to  set  Tf  =  1260°  k  and  the  initial 
value  of  V'  to  30°.  The  initial  temperature  of  the  ambient  air  was  taJcen  as  293°K, 
2is  was  that  of  the  second  (non-burning)  object.  Because  the  fire  had  just  started, 
it  wais  assumed  that  the  surface  temperature  of  the  object  1  was  ambient,  except  at 
those  points  located  inside  r/,  where  the  temperature  was  assumed  to  be  740° 
K. 
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Figure  2  shows  the  results  of  a  calculation  cairried  out  using  both  the  original 
version  of  CFG  v  and  a  modified  version,  incorporating  the  chajiges  described  in  the 
previous  section.  As  can  be  seen,  ibr  the  first  16  seconds,  both  versions  yield  ap¬ 
proximately  the  same  results,  after  which  time  the  modified  version  of  CFG  predicts 
a  marked  increase  in  the  rate  of  flame  spread.  This  rapid  increase  after  16  seconds 
seems  to  be  largely  an  artefact  of  the  calculation.  In  order  to  keep  the  computation 
times  as  short  as  possible,  the  changes  in  temperature,  AT,  were  evaluated  at  only 
four  points  on  each  object.  For  ease  of  computation,  GFG  V  simulates  all  objects  by 
cylinders  of  the  same  height  as  the  objects  they  represent,  and  whose  radii  Bo  are 
chosen  so  as  to  give  each  cylinder  the  same  total  surface  area  as  the  object  it  mod¬ 
els.  The  four  points,  then,  at  which  the  temperatures  were  evaluated  were  tq  =  0, 

fi  =  .05  m,  r2  =  2  —  ^3  ~  sharp  transition  at  16  seconds  occurs 

at  the  point  at  which  the  application  of  eq.  (50)  for  the  estimation  of  f/  is  switched 
from  ri  to  r2,  and  hence  it  is  reasonable  to  expect  that  much  of  the  apparent  in¬ 
crease  in  r/  is  due  to  estimation  errors.  This  could  be  improved  by  substantially 
increasing  the  number  of  points,  at  a  considerable  sacrifice  in  computation  speed. 

Figure  3  shows  the  averaged  radiative  flux  from  the  flame  on  each  object  1.  As 
can  be  seen,  where  the  two  fire  radii  are  nearly  the  same,  the  modified  code  predicts 
significantly  less  flux  than  the  original.  This  difference  seems  to  arise  because  the 
term  represented  by  $3  in  our  version  was  set  to  zero  in  the  original  version  of  GFG 
V.  Since  this  term  is  always  negative,  it  has  the  effect  of  significantly  reducing  the 
flux  from  the  value  predicted  by  the  original  version  of  GFG  V. 

In  the  original  version  of  GFG  V,  the  surface  temperature  of  any  burning  object 
was  always  set  to  r.,  for  all  points  on  the  object’s  surface.  In  our  modified  version 
of  GFG  V,  the  average  temperature  of  an  object  was  calculated  by  assuming  that 
the  centre  of  the  fire  maintained  a  constaint  temperature  of  Ti,  and  the  temperature 
was  a  linear  function  from  tq  to  rs.  Figure  4  shows  the  result  of  this  calculation. 
Figure  5  shows  the  same  calculation  for  object  2.  Since  this  object  never  ignited, 
the  linear  approximation  of  the  temperature  variation  across  its  surface  should  be 
rather  more  realistic  than  for  object  1.  The  slightly  uneven  temperature  rise  in  the 
modified  version  of  GFG  V  arises  because  the  contribution  to  the  total  flux  from 
the  walls  of  the  compartment  varies  from  moment  to  moment.  Object  2  shows  no 
temperature  rise  in  the  original  version  of  GFG  V  because  the  fliix  on  object  2  never 
rises  above  .01  W/m^,  below  which  level  the  flux  is  considered  to  be  negligible  in 
both  versions  of  GFG  V. 

Finally,  Figs.  6  and  7  show  the  surface  temperature  and  flux,  calculated  with  the 
modified  version  of  GFG  V,  at  tq,  ri,  r2  zmd  rs.  It  was  assumed  that  the  temperature 
inside  r/  would  always  be  constant  at  The  calculations  of  the  flux  (except  for 
L  =  0  which  is  a  special  case)  ,  was  terminated  when  the  fire  reached  that  point, 
since  eqs.  (28),  (29), (32),  (36),  (37)  are  not  valid  for  L  <  r/. 
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5  Concluding  Remarks 

In  this  work,  we  have  discussed  the  modifications  made  to  CFG  V  in  order  to  im¬ 
prove  the  calculation  of  the  flux  to  an  object  from  a  conical  flame.  It  was  then 
shown  how  the  value  of  the  flux  could  be  used  to  determine  the  temperature  varia¬ 
tion  across  the  surface  of  an  object,  and  consequently  the  rate  of  flame  spread  across 
the  object.  Some  numerical  results  were  presented  and  compared  with  results  from 
the  unmodified  version  for  identical  initial  conditions. 

Considerable  work  remains  to  be  done  in  order  to  improve  the  fidelity  of  this 
fire  simulation.  The  number  of  points  at  which  the  temperature  profile  across  a 
burning  object  is  calculated  should  be  increased,  therby  improving  the  accuracy 
of  the  computation  of  the  flame  spread,  more  accurate  models  for  the  flux  from  the 
flames  to  the  walls  and  ceiling  should  be  developed,  as  well  as  for  the  exchange  of 
heat  between  the  hot  layer  and  objects  in  the  room. 
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ON  THE  NUMERICAL  SOLUTION  OF  A  SYSTEM  OF  PARTIAL  DIFFERENTIAL  EQUATIONS  TO 
OBTAIN  THE  WIND  FROM  THE  GEOPOTENTIAL  FOR  NUMBERICAL  WEATHER  PREDICTION  AND 
ON  RELATED  MATHEMATICAL  ASPECTS. 


H.  Baussus  von  Luetzow 
U.S.  Army  Engineer  Topographic  Laboratories 


ABSTRACT.  The  paper  first  discusses  the  numerical  solution  of  a  system  of 
partial  differential  equations  as  optimal  filter  equations  to  obtain  the 
wind  field  from  the  geopotential  field  under  consideration  of  solution  con¬ 
straints.  It  then  addresses  the  use  of  these  equations  in  the  case  of  available 
horizontal  winds,  constrained  initialization  in  the  sense  of  Sasaki,  non- 
statistical  smoothing  and  univariate  and  multivariate  estimation  and  smoothing 
of  meteorological  variables,  and  a  special  problem  of  data  assimilation. 

Finally,  it  outlines  a  non-hydrostatic  prognostic  approach  in  the  case  of 
highly  accurate  and  dense  initial  meteorological  fields.  Although  emphasiz¬ 
ing  the  mathematical  point  of  view  including  the  need  for  parallel  and  large 
scale  computing,  the  paper  also  endeavors  to  relate  to  the  present  state 
of  the  art  in  numerical  weather  prediction. 

I.  INTRODUCTION.  Present  operational  numerical  weather  prediction  models 
use  the  hydrostatic  equation  and  require  hydrodynamic  and  hydrostatic  stability 
which,  in  the  numerical  solution  process,  is  enforced  by  a  convective  adjust¬ 
ment.  The  horizontal  grid  length  used  is  generally  greater  than  50  Km,  and 
the  number  of  vertical  levels  is  usually  15  or  less.  In  a  prediction  system 
with  pressure  ^  as  the  vertical  independent  coordinate,  the  dependent  variables 
are  the  win/i  components  U  and  V  ,  the  geopotential  ,  the  diabatic  rate 
of  heat  ^g/dt  and  the  mixing  ratio  f  ,  sometimes  replaced  by  the  relative 
humidity.  Further  considered  are  the  saturation  mixing  ration  ,  the  pre¬ 

cipitation  criterion  j  ,  and  the  sea  surface  temperature  7^  .  The  so-called 
primitive  equations  which  incorporate  the  hydrostatic  equation  are  the  two 
equations  of  horizontal  motion,  the  continuity  equation,  the  thermodynamic 
equation,  and  the  continuity  equation  for  the  mixing  ratio.  The  latter  may 
be  supplemented  by  continuity  equations  for  substances  other  than  water. 

In  global  models,  spherical  coordinates  are  employed  in  the  horizontal. 

Most  models  use  normalized  pressure  where  stands  for  surface  pressure. 

Further,  so-called  spectral  methods  are  generally  used  in  major  forecast 
centers  to  compute  horizontal  derivatives  with  a  high  accuracy.  The  determination 
of  initial  fields  of  dependent  variables  is  the  subject  of  objective  analysis. 

The  most  important  endeavor  is  the  estimation  of  ^  ,  V  ,  and  at  regular 
grid  points  from  irregularly  distributed  geopotential  and  wind  data  by  multi¬ 
variate  statistical  methods  and  the  geostrophic  relationship  between  the 
wind  and  the  geopotential.  In  the  past,  geopotential  data  have  been  considerably 
more  numerous  than  wind  data,  and  in  the  near  future  this  situation  is  not 
expected  to  change.  As  a  consequence,  in  order  to  use  wind  and  geopotential 
data  in  the  numerical  integration  process,  filter  equations  have  been  developed 
classified  as  static,  dynamic,  and  normal  mode  initialization.  Unfortunately, 
the  determination  of  the  wind  field  from  the  geopotential  field  is  unsa t i s f ac t crv 
in  the  equatorial  belt  where  wind  estimates  are  presently  obtained  at  about 
2  levels  from  the  movements  of  clouds  and  not  in  a  desirable  density.  New 


597 


earth  observing  systems  under  development  are  expected  to  provide  highly 
accurate  and  dense  wind  measurements  over  many  areas  of  the  globe.  Denser 
and  more  accurate  measurements  of  surface  pressure,  temperature,  and  humidity 
will  be  of  utmost  importance  for  numerical  weather  prediction. 

Experience  has  shown  that  numerical  humidity  forecasts  are  only  satisfactory 
for  1-2  days  in  contrast  with  better  predictions  of  the  other  meteorological 
variables.  This  is  partially  due  to  the  coarse  representation  of  the  humidity 
field  which  requires  a  relatively  smaller  grid  of  resolution,  both  horizontally 
and  vertically.  However,  incorporation  of  a  more  detailed  humidity  field 
would  result  in  a  greater  number  of  hydrodjmamic-hydrostatic  instabilities 
through  the  interaction  of  the  continuity  equation  of  water  vapor  with  the 
thermodynamic  equation.  As  shown  by  Baussus  von  Luetzow  (1980),  convection 
on  the  mesoscale,  requiring  a  horizontal  grid  resolution  of  about  lOKm, 
necessitates  a  more  sophisticated  equation  for  the  vertical  wind  velocity 
and  the  application  of  the  unmodified  continuity  equation  in  a  coordinate 
system  with  Z  as  the  vertical  coordinate.  Parameterization  of  certain  sub¬ 
grid  processes,  like  moist  cumulus  convection,  would  also  be  more  successful 
in  the  non-hydrostatic  prediction  system.  Simultaneously,  Baussus  von  Luetzow 
described  a  signal  generation  process  approximately  equivalent  to  the  present 
hydrostatic  forecast  system,  particularly  for  a  period  of  several  days. 
Significant  in  this  respect  is  the  statement  by  Ghil  and  Childress  (1987) 
that  the  practical  limit  of  usefulness  of  numerical  weather  forecasts  is 
between  3-7  days.  This  limit  could,  however,  be  extended  by  nonhydrostatic 
forecasts  in  combination  with  a  much  denser  and  more  accurate  data  base  than 
presently  available.  The  specification  of  lower  and  upper  boundary  values 
required  for  the  numerical  solution  process  presents  additional  difficulties. 
Their  inaccuracies  tend  to  degrade  the  forecast  with  increasing  prediction 
time.  As  to  an  improvement  of  upper  boundary  values,  Baussus  von  Luetzow' s 
signal  generation  system,  cited  above,  reduced  to  one  level,  has  some  potential 
value . 


This  paper  addresses  in  section  H  the  numerical  solution  of  optimal 
filter  equations  with  emphasis  on  the  determination  of  the  wind  from  the 
geopotential  as  the  main  effort.  Section  III  contains  some  considerations 
about  the  incorporation  of  friction.  The  performance  characteristics  of 
new  earth  observing  systems  are  shown  in  section  IV.  The  use  of  filter  equations 
as  conditional  equations,  including  constrained  initialization,  is  the  subject 
of  section  V.  Section  VI  is  concerned  with  a  critique  of  objective  analysis 
as  practiced  at  this  time.  Relevant  comments  about  a  nonhydrostatic  approach 
are  made  in  section  VII,  and  section  VIII  enumerates  some  pertinent  conclusions. 
Throughout  the  paper,  including  the  introduction,  the  author  has  endeavored 
to  relate  to  the  present  state  of  the  art  in  numerical  weather  prediction 
and  to  offer  some  new  and/or  relevant  points  of  view. 


II.  NUMERICAL  SOLUTION  OF  OPTIMAL  FILTER  EQUATIONS.  According  to  Baussus 
von  Luetzow  (1971,  1980),  the  following  system  of  diagnostic  filter  equations 
can  be  derived  from  the  hydrostatic  equations  of  motion  without  friction 
under  partial  use  of  the  thermodynamic  equation  in  a  planar  x ,y , p , t-sys tern: 


0 


0) 
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2  if  ^  P^-*-  ?fdp  ^Jc^jO 

/■  a,  ^  ^  4_j  &<  _  f(‘(,y,/>) 


In  eq.(l),  £  is  the  Coriolis  paroraecer  and stands  for  .  In  eq. 

(2).a,.^  is  the  generalized  vertical  velocity,  and  is  the  effective 

static  stability.  The  terms  4^^  4:^  4/ii^  are  functions  of  the  geopotential 
and  of  ^  ^indFlic,y^f^ts  comprised  of  functionals  involving  'U ^  V",  ^  j  and 
spatial  derivatives  thereof,  and  of  the  radiation  component  of 

Subsequently,  Baussus  von  Luetzow  (1988)  showed  that  eqs.  (1)  and  (2) 
are  optimal  filter  equations  or  equilibrium  solutions  free  of  high  frequency 
gravity-inertia  waves  and  superior  to  normal  mode  initialization.  Filter 
equations  using  spherical  coordinates  %  and  X  which  correspond  to  eqs. 

(1)  and  (2)  can  be  derived  as  well. 

Actually,  the  filtered  variables  in  eqs.  (1)  and  (2)  should  be  designated 
by  a  symbol,  eg.,  ^  ,  to  distinguish  them  from  unfiltered  variables.  However, 
^  tends  to  be  a  non-fluctuating  variable  in  the  first  place,  and  it  is 
the  primary  purpose  of  eqs.  (1)  and  (2)  to  determine  filtered  or  relatively 
smooth  winds  primarily  from  the  geopotential.  Finally,  the  solution  of  the 
prognostic  equations  in  a  discrete  manner  unplies  the  use  of  sufficiently 
smooth  variables. 


In  the  iterative,  interactive  solution  of  the  system  (1)  and  (2)  it  is 
necessary  to  observe  the  Helmholtz  decomposition 

This  decomposition  applies  only  to  filtered  variables  free  of  vertical  trans¬ 


verse  waves. 


Using  eqs.  (3a)  and  (3b),  vorticity  and  divergence  can  be  expressed  as 

Because  of  the  dominance  of  the  vorticity  in  horizontal  motion,  the  first 
approximation  to  eq.  (1)  can  be  formulated  as 
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It  is  obvious  that  eq.  (5)  is  not  effective  in  the  equatorial  belt  where 
the  neglected  terms  involving  the  velocity  potential  X  approach  those  associated 
with  the  stream  function  y  .  Wind  observations  in  the  equatorial  belt  are 
thus  indispensable  for  successful  numerical  weather  prediction  in  the  tropics. 
Additionally,  the  source  function  on  the  right  side  of  eq.  (5)  and  the  partial 
derivatives  of  the  geopotential  are  not  strong  in  the  equatorial  belt. 


If  the  smooth  wind  field  is  Known,  the  geopotential  can  be  computed  by  means 
of  eq.  (1)  without  reference  to  the  omega  equation  (2).  Again,  the  source 
function  a* -  ZjS/n  would  be  generally  small  in  the  equatorial  belt. 

Because  of  the  re lativ«^  loose  functional  relationship  between  wind  and  geopotential 
in  the  equatorial  belt,  they  should  be  independently  and  as  accurately  as 
possible  determined  by  measurements. 


The  solution  criterion  for  the  first  approximation  is  obtained  from 

eq.  (5)  as 


It  is  generally  fulfilled,  corresponds  to  the  criteri<l  applicable  to  the 
omega  equation  (2),  and  can  be  imposed  if  necessary.  In  this  respect  remember 
that  (fi  is  generally  not  free  of  "measurement"  errors. 


The  solution  criteria  for  the  omega  equations  are 

>  0 

They  can  be  imposed  if  required.  ' 


(7^) 

(7J>) 

C7c) 


The  filter  equations  (1)  and  (2)  are  applicable  for  a  horizontal  grid  size 
A  ic  m.  Ay  ^  /OC  f<^rn  compatible  with  about  15  vertical  levels. 

Equation  (5)  may,  for  example,  be  solved  for  a  square  region  2000Km  •  2000Km 
with  A»  ^  “•  2.00 ,  In  this  case,  there  are  81  —  unKnowns. 

An  effective  solution  process  is  the  following: 


(1)  Formulation  of  finite  difference  equations  for  each  interior  grid  point. 

(2)  Computation  of  functionals  ,  etc . 

(3)  Establishment  of  the  matrix  equation 


A  - 

(o  ^ 

with  9'  =  /  <ZJ 

(4)  Solution  of  eq.  (8)  as 

.  (o 

V 


on  the  boundary. 


IS) 


but  only  for  the  central  interior  point  where  <  /iz,  *  /jj 

(5)  Equation^-  type  solutions  for  moving  central  points  by  translational 

shifts  of  the  quadratic  integration  area  by  L  ak  ^  J  S  '  •  . 

(6)  Identification  of  about  25  square  regions  as  an  aggregate  region. 

(7)  Parallel  processing  for  separate  aggregate  regions. 
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•  rr  i  cerreet 


(8)  Improvement  of  initial  solutions  by  improved  boundary  values  ^  ^ 

rtSi*.^irt^  'rr  i  c»rre(^,'trr  ^trm  ^ ^ 

f  ^  A;~'  Sf;j  Oo) 

Under  consideration  of  9  intermediate  levels,  limited  by  a  ground  and  a  top 
level,  the  fundamental  integration  region  comprises  729 4;-unKnowns .  The 
solution  process  to  obtain  is  the  following: 

(1)  Formulation  of  finite  difference  equations  for  each  interior  grid  point. 

(2)  Computation  of  coefficient  functionals  and  of  F  )  under 

utilization  of  and  as  radiation  heat  compatible  with 

the  grid  resolution. 

(3)  Establishment  of  the  matrix  equation 

with  cJ  ■=■  0  the  upper  boundary  and  at  the  lateral  boundaries  and 

■=.  C  r  ^  ^-^4-)  where  c  is  a  constant  ,  ant{  are 

the  wind  components  at  grdund  level,  and  is  the  geopotential  of  the  ground 

commensurate  with  the  horizontal  grid  resolution.  As  an  approximation,  «  mty 

be  replaced  by  -  obtained  at  the  lowest  level.  ^  4  7 

(4)  Solution  of  eq.  (11)  as 


C/Z) 

resulting  in 


but  only  for  central  interior  points  where  c  —  /ft  =  /n  ^  resulting  in 
determinations. 

(5)  Translational  shifts  and  parallel  processing  for  separate  aggregate 
regions  in  accordance  with  those  applicable  to  determination. 

The  solution  process  to  obtain  improved  stream  functions  is; 

(1)  Computation  of  X  from  —  ~  • 

(2)  Computation  of  and  ■y- (O  from  ^ . 

(3)  Computation  of  and  ^ 

(4)  Determination  of  the  omitted  terms  of  eq.  (1)  by  finite  difference  methods, 

yielding  a  right  side  corrective  source  function  /;r  m  where  7o  Tie 

replaced  by  7'^^  . 

(5)  Computation  of 


_  _  ^2  w.  ^  (t) 

V  — 


^  V  V  ^  V 


but  only  for  central  interior  points.  -/ 

(6)  Use  of  eq.  (13)  for  all  computation  grid  points,  i.e.,  with  variable  A‘- 

and  A  f;;. 

*\J 

If  indicated  by  experimentation,  an  improved  solution  may  be  attempted 

which  can  also  be  achieved  in  a  differential  form  without  new  matrix  inversions. 


The  solution  of  a  system  of  linear  equations  with  unKnowns  of  the  order  10 

can  be  accomplished  without  time  constraints  by  the  new  generation  of  supercomputers 

which  are  a  requisite  for  the  timely  and  accurate  solut'on  of  the  omega  equation. 

According  to  Elmer-DeWitt  (1988),  the  Cray-3  will  be  released  in  1989,  soon 

followed  by  the  Cray-4.  IBM  and  AT&T  Bell  Laboratories  are  on  the  verge 

of  introducing  new  parallel-processing  computers,  and  Sandia  National  Laboratories 

has  coaxed  a  1024  -  processing  computer.  Fortunately,  the  matrices  to  be 

inverted  are  essentially  band  matrices  with  numerous  zero  elements.  Care 

has  to  be  exercised  to  further  the  stability  of  the  solution  both  in  regard 
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to  ellipticity  and  to  accurate  coefficient  functions  of  the  omega  equation, 
notably  of  •  Additionally,  round  off  errors  have  to  be  controlled.  Whether 
application  of  Gauss'  transformation,  matrix  splitting,  and  successive  overrelaxation 
methods  are  promising,  only  experimentation  can  tell.  If  sufficiently  accurate 
predicted4>-data  are  available  at  initialization  time  t^ ,  these  might  be 
used  as  lateral  boundary  values  with  a  resulting  decrease  of  the  fundamental 
computational  region. 

III .  INCORPORATION  OF  FRICTION.  The  inclusion  of  frictional  terms 

F,t.  surface  layer  (O-lOOm)  and  /''•  /iyer 

(lOO-lOOOm)  precludes  a  good  determination  of  filtered  winds  from  the  geo¬ 
potential  by  means  of  eqs.  (1)  and  (2).  The  structure  of  these  terms  have 
been  discussed,  among  others,  by  Kasahara  (1977)  and  Corby,  Gilchrist  and 
Rowntree  (1977).  The  frictional  terms  increase  the  divergence  in  comparison 
with  the  relative  vorticity  and  tend  to  make  the  flow  hydrodynamically  unstable, 
particularly  in  low  latitudes.  Only  in  the  stationary  case  o.-.  . 

where  V  denotes  the  velocity  vector,  and  under  consideration  of  simplified 
frictional  terms,  approximate  wind  components  may  be  computed  from  the  geopotential. 
The  general  filter  equations  (1)  and  (2)  are  only  fully  effective  in  the 
free  atmosphere. 

IV,  NEW  EARTH  OBSERVING  SYSTEMS.  The  lack  of  a  well  synchronizeable ,  dense 
and  accurate  global  data  base  has  been  the  greatest  drawback  for  numerical 
weather  prediction.  New  earth  observing  systems  under  development  and  assumed 
to  be  operational  in  the  foreseeable  future  can  be  expected,  in  conjunction 
with  improved  modified  objective  analysis,  improved  initialization  and  prog¬ 
nostic  models,  and  supercomputers  to  revolutionize  numerical  weather  prediction. 

The  new  systems  of  particular  interest  are  the  subject  of  the  LAWS,  HMMR 

and  LASA  Instrument  Panel  Reports  (1987),  published  by  the  National  Aero¬ 
nautics  and  Space  Administration.  The  performance  characteristics  of  the 
above  systems  are: 

LAWS  -  LASER  ATMOSPHERIC  WIND  SOUNDER  (DOPPLER  LIDAR) 

100  KM  HORIZONTAL  RESOLUTION 
1  KM  VERTICAL  RESOLUTION 

1- 2  MS  -  ^  LOWER  TROPOSPHERE 

2- 5  MS  -  1  UPPER  TROPOSPHERE 
CLOUD  COVER  AND  RAIN  REMAIN  OBSTACLES 

HMMR  -  HIGH  RESOLUTION  MULTIFREQUENCY  MICROWAVE  RADIOMETER 
IMPROVED  TEMPERATURE  PROFILES  (flK) 

IMPROVED  HUMIDITY  PROFILES  (^10  PERCENT) 

HORIZONTAL  RESOLUTION  10  KM 
VERTICAL  RESOLUTION  0.5  KM 

LASA  -  LIDAR  ATMOSPHERIC  SOUNDER  AND  ALTIMETER 
SURFACE  PRESSURE  (£  2MB) 

VERTICAL  PROFILES  OF  TEMPERATUFJ:  AND  PRESSURE  FROM  THE  STRATOSPHERE  THROUGH 
THE  TROPOSPHERE,  TO  THE  GROUND  TEMPERATURE  IN  TROPOSPHERE  AND  STRATOSPHERE. 

The  prognostic  horizontal  grid  resolution  should  be  smaller  than  100  KM  to 
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minimize  forecast  errors,  especially  in  medium  to  long  range  predictions. 

V.  USE  OF  FILTER  EQUATIONS  AS  CONDITIONAL  EQUATIONS.  The  filter  equations 
(1)  and  (2)  will  still  be  useful  after  the  introduction  of  LAWS  since  cloud 
cover  and  rain  present  obstacles  to  wind  measurements  or  accurate  wind  measure¬ 
ments. 

The  first  filter  equation  or  both  filter  equations  may  be  used  in  connection 
with  "measured"  winds  and  geopotentials  to  obtain  improved  initial  fields. 

In  a  first  application  optimally  smoothed  LAWS  wind  measurements  in  the  lower 
troposphere  may  be  used  to  determine  (^(•u,V')  by  virtue  of  eq.  (1).  The 
"measured"  optimally  smoothed  geopotential  may  be  0  .  An  improved  geopotential 
would  then  result  as 


where  K.  and  K,  are  regression  coefficients. 

f  AT 


In  a  second  application,  LAWS  wind  measurements  in  the  upper  troposphere 
may  be  improved  and  to  a  lesser  extent  the  geopotential  in  the  following 
sequence : 


is  determined  from  eq.  (1). 

An  improved  <p  is  calculated  according  to  eq.  (14). 

( (^ ,  )  is  computed  by  means  of  eq.  (2). 

It  is  then  possible  to  formulate  an  improved  u)  where 


(1) 

(2) 

(3) 

(4) 

are  regression  coefficients  and  where  =  u) 

(5)  «  And  It  are  eomniited  from  ^ 


\jj  -u,^  and  TJ  are  computed  from 

(6)  ^  ar  is  determined  from  e(^t  (1). 


(u, 


The  above  procedure  would  simultaneously  provide  a  multiple  consistency  cheeky 
and  the  integration  domain  for  the  solution  of  eqs.  (1)  and  (2)  could  be 
reduced. 


A  third  application  would  be  the  estimation  of  an  improved  geopotential  from 
the  "measured"  geopotential  s  from  the  computed  geopotential 

as  regression  coefficients.  Potentially, a  multiple  regression  approach  could 
be  used.  Merging  a  measured  and  a  computed  field  would  only  be  warranted 
in  the  case  of  relatively  large  —  errors  which  are  not  uniform  be¬ 
cause  of  the  actual  estimation  of  in  the  context  of  objective 

analysis,  addressed  in  section  VI.  The  above  approach  has  been  suggested 
by  Hoffman  and  Kalnay  (1983).  The  improved  p  — 
used  for  initialization  by  means  of  eqs.  (1)  and  (2). 


field  can  then  be 


In  constrained  initialization  according  to  Haltiner  and  Williams  (1980), 
the  integral 

T  ~  [oC  ( ^ ^  (vv' t  'Z  ^  ^  W 

is  minimized  by  a  variational  method.  In  eq.  (15),  and />  are  generally 
latitude-dependent  weights,  ^  and  denote  fields  obtained  from  objective 
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analysis,  ^  is  a  variable  Lagranglan  multiplier,  M  is  the  classical  truncated 
balance  equation,  and  S  is  the  integration  area.  Variation  of  I  under  neglect 
of  nonlinear  terms  in  M  yields  two  differential  equations  which,  together 
with  M,  permit  the  determination  of  A  and  of  improved  fields  of  <p  Y'. 

The  above  approach  has  some  merit  in  the  context  of  recent  ^  and  coverage 
and  5?  and  (7  estimation  by  objective  analysis.  It  is  not  satisfactory  in 
the  case  of  sufficiently  dense  and  accurate  ^-fields  and  their  use  in  eqs. 

(1)  and  (2).  It  is  not  required  in  future  more  accurate  wind  and  geopotential 
determinations  associated  with  nds^  earth  observing  systems.  Substitution 
of  eq.  (5)  for  the  truncated  classical  balance  equation  M  would  result  in 
somewhat  improved  ^  and  ^  solutions. 

VI.  OBJECTIVE  ANALYSIS.  Present  objective  analysis  concentrates  on  the 
estimation  of  the  geopotential  and ^ secondarily ,  on  the  estimation  of  winds, 
using  "measured"  and  generally  not  uniformly  distributed  data.  The  estimation 
of  the  geopotential  can  be  characterized  as 

-1  “mi  4>i  ^  ^i )  1  ‘‘mj  W 

In  eq.  (16),  the  symbol  d  denotes  deviations  from  climatological  means^  ^ 

are  uncorrelated  measurement  errors,  and  -  inZCm..’  ®re  regression 

coefficients.  The  subscript/}}  indicates  estimation  a f  point  .  The  multivariate 

estimation  (16)  is  generally  performed  at  one  isobaric  level  although  it 

can  be  extended  to  other,  reasonably  close  isobaric  levels.  Covariance  analysis 

of  the  geostrophic  relationship  permits  the  estimation  of and  of^e^ 

as  the  climatological  mean.  A  recent  review  of  methods  of  objective  anafysis'^ 

has  been  made  by  Gustafsson  (1981). 

The  approach  (16)  is  not  satisfactory  in  the  case  of  future  more  accurate 
and  more  uniformly  distributed  "measured"  winds  and  geopotentials  and  because 
the  meteorological  generation  process  is  neither  ergodlc  nor  stationary. 

Newly  developed  advanced  smoothing  techniques  such  as  those  addressed  by 
Adams,  Willsky,  and  Levy  (1984)  would  be  more  appropriate. 

VII.  NONHYDROSTATIC  APPROACH.  Highly  accurate  and  dense  measurements  of 
pertinent  meteorological  variables,  provided  by  new  earth  observing  systems, 
in  combination  with  advanced  smoothing  techniques  might  permit  the  replace¬ 
ment  of  the  primitive  equations  with  a  nonhydrostatic  prediction  system  as 
outlined  by  BAUSSUS  von  LUETZOW  (1980).  This  system  with  as  the  vertical 
coordinate  has  a  more  complicated  diagnostic  equation  for  the  vertical  wind 
component  w  and  leaves  the  continuity  equation  invariant,  i.e.,  introduces 
an  additional  degree  of  freedom.  Application  of  the  w-equation,  however, 
requires  improved  condensation  criteria  and  additionally  incorporation  of 
improved  parameterization  of  moist  cummulus  convection  to  be  highly  effective. 

Only  then  can  the  inltal  humidity  field  be  fully  exploited.  In  agreement 
with  Anthes,  Kuo,  Baumhefner,  Errico,  and  Bettge  (1985),  the  nonhydrostatic 
system  would  be  able  to  cope  with  meso-  scale  phenomena  including  frontal 
and  jetlike  discontinuities,  flows  produced  in  response  to  small  scale  topo¬ 
graphic  forcing,  and  large-amplitude  instabilities  such  as  convective  storms. 
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VIII.  CONCLUSION. 


(1)  The  wind  field  can  be  satisfactorily  computed  from  the  geopotential 
field  in  the  free  atmosphere  and  outside  the  equatorial  belt  by  the  numerical 
solution  of  a  system  of  two  partial  differential  equations,  using  supercomputers 
and  parallel  processing.  This  method  is  also  promising  in  the  case  of  Doppler 
Lidar  failure. 

(2)  The  optimal  diagnostic  filter  equations  permit  the  determination  of 
improved  geopotential  and  wind  fields  in  the  case  of  both  uniform  and  dense 
"measured"  wind  and  geopotential  coverage,  particularly  in  the  upper  tropo¬ 
sphere  . 

(3)  New  earth  observing  systems  with  the  capability  to  provide  uniform, 
dense,  and  more  accurate  determinations  of  meteorological  variables  and 
advanced  smoothing  techniques  permit  the  application  of  a  nonhydrostatic 
prediction  system  which  could  fully  exploit  the  availability  of  the  initial 
humidity  field. 
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Introduction 


This  work  concerns  Lie  transforms,  a  method  for 
obtaining  approximate  solutions  to  systems  of  differential 
equations.  We  apply  the  method  to  a  general  class  of  two  degree 
of  freedom  Hamiltonian  systems,  viz.,  two  coupled  nonlinear 
oscillators  with  nonresonant  frequencies.  For  systems  in  this 
class,  we  use  Lie  transforms  to  approximately  reduce  the  system 
to  an  equivalent  simpler  system  which  is  immediately  solvable, 
i.e.,  a  system  with  ignorable  coordinates. 

As  an  application  of  our  results,  we  determine  the 
nonlinear  stability  of  the  triangular  points  in  the  circular 
restricted  three  body  problem.  In  doing  so  we  corroborate  a 
computation  recently  performed  by  Meyer  and  Schmidt  [16].  Their 
computation  was  based  on  their  own  computer  algebra  program 
written  in  PL/I,  whereas  the  present  work  is  based  on  readily 
available  utilities  written  in  MACSYMA  [19].  Moreover,  while 
their  computation  was  specifically  performed  for  the  problem  at 
the  triai^ular  point  L^,  the  present  work  applies  to  a  problem 

with  arbitrary  (symbolic)  coefficients. 

We  begin  by  introducii^  the  reader  to  Lie  transforms. 
Then  we  show  how  the  method  may  be  applied  to  a  particular  class 
of  problems,  and  finally  we  specialize  the  results  to  some 
examples,  including  the  problem  at  L^. 
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Lie  Transforms 


In  this  section  we  sumnarize  the  method  of  Lie 
transforms  (see  [8],  [12],  [15],  [17]).  This  work  is  concerned 
with  Hamiltonian  systems,  i.e.  systems  which  are  derivable  from 
a  single  scalar  function  H.  the  Hamiltonian: 


(1) 


dx  dy 

m  dH 


dt 


ay  •  dt 
m 


an 

ax 


m 


where  x  and  y  are  the  dependent  variables  of  the  problem, 
m  m 

m  =  1 . N,  where  N  is  called  the  number  of  degrees  of  freedom. 

The  method  of  Lie  transforms  generates  a  near-identity 

transformation  from  (x  .y  )  to  (X  .Y  )  variables. 

'•  m  ■^m-'  '  m  m^ 

_  quadratic  terms  cubic  terms 

''m  -  ^m  in  (Xj^.Yj^)  in  (Xj^.Yj^) 


(2) 


Y  quadratic  terms  cubic  terms 
^m  =  ’^m  in  in 


which  is  canonical,  i.e..  which  preserves  the  Hamiltonian  form 
of  the  equations: 


(3) 


dX  _  dY 
m  ax  m 


dt 


aY  ’  dt 
m 


ax 

ax 


m 


where  X  =  X(X  .Y  )  =  H(x  .y  )  is  the  Htuniltonian  in  the  new 
'■mm''  '  m  •'m'' 

variables  (called  the  Kamiltonian  after  Goldstein  [11]). 

The  near-identity  transformation  is  generated  by  first 
introducing  a  scaling  parameter  e  into  the  problem.  Expjanding  H 
in  a  power  series  about  the  origin  (assumed  to  be  an  equilibrium 
position) , 
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(4)  H  =  Hp(x„.y_^)  +  6  H,(x^,y^)  +  t  H2{x^.y^)  4- 


where  Hj^(^,„-y,„)  is  a  polynomial  of  degree  n+2.  Then  the 

near-identity  transformation  is  generated  by  the  associated 
Hamiltonian  system 


in  which  e  plays  the  role  of  time.  The  transformation  evolves 
in  e.  starting  with  the  initial  conditions 


(6) 


The  Hamiltonian  W  of  eqs.(5),  called  the  generating  function,  is 
also  expanded  in  a  power  series  in  &: 


+  e  Wg  +  e 


where  is  a  polynomial  of  degree  n+2.  The  point  of  this 

generating  scheme  is  that  the  resulting  t  reins  format  ion  is 
canonical  for  any  choice  of  the  W^’s  (see  [8],  [15]).  The 

actual  choice  of  these  functions  depends  upon  the  problem  at 
hand,  but  the  main  idea  is  to  pick  them  so  that  the  new 
Hamiltonian  K  is  as  simple  as  possible.  We  note  that  the 
parameter  a  in  this  paper  corresponds  to  -a  in  [15]  and  [19]. 

The  transformation  is  generated  by  expanding  the 
variables  Taylor  series  in  a  and  using  the  generating 

equations  (5)-(7)  to  evaluate  the  coefficients: 
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^  _a  ^  V 

de  dy^  Z  dx^dy^  de  dy.dy^  da 


e=0 


aw, 

4i 

dY 


m 


V  a^  aw 
Z  3x  ay  ay 

J  j  m  J 


a^  aw 


ay^ay^  axj 


e=0 


aWj 

W 


m 


V  ^1  aWj  a^i  aWj 

2  ax.aY  aYT  ”  av^av  ax7 

J  j  m  j  j  m  j 


aw, 

av 


m 


aw- 

-  {ar-»i> 

m 


where  the  Poisson  or  Lie  bracket  {f.g}  is  given  by 

=  \  ilr^  ■  if:  ^  • 

J  J  J  J  J 

The  transformation  is  thus  found  to  be  given  by 


(13) 


awi 

^m  "  \  aY  ^ 
m 


aw„  aw 
a?” 

m  m 


+  •  •  • 


emd  similarly. 
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aw. 

(14)  y  =  Y  -  £  - 

^  m  ax 

m 


aw^  aw. 

W~  *  ^diT'^O 

m  m 


+  •' 


In  order  to  obtain  the  transformed  Kamiltonian  K  (cf. 
eq.(3)).  the  transformation  (13). (14)  is  substituted  into  a 
power  series  expansion  for  the  original  Hamiltonian  H: 


(15)  K(X  .Y  )  =  H(x  .y  ) 
'  ^  m  m'  '  m  •'m^ 


=  ^O^'^m-ym^  ^  ^  H.,(x„.y^)  + 


1  m  m 


2"  m’-^m' 


aw  aw 

V-m-^m)  =  Ho(\  "  ar  "  "  ***-^»  -  ^  -  -  •••) 


m 


m  ax 


m 


=  H, 


dH- 

d^- 

+  _o 

0 

^  de 
e=0 

^  ^  2 
e=0  de^ 

e=0 


(17) 


H, 


0 


£=0 


=  H^(X  .Y  ) 
0'’  m  m'' 


(1®)  dT 


„  Z  ax .  de  ay .  de 

e=0  .  J 


e=0 


_  Y  ^  ^ 

=  Z  axj  ayj  -  ay^ 


e=0 


_  aH_  aw.  aH»  aw. 

""  Z  ajC  aY”  ”  WT  WT  = 

j  j  J  j  j 


where  the  generating  eqs.(5)-(7)  have  been  used.  This  gives 
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(19) 


2 

+  C{{Hc.Wi}.Wj}  +  {HQ.W2}]  |-+  ••• 


This  equation,  which  represents  the  expansion  of  Hq  under  the 

near-identity  transformation  (13), (14).  also  holds  for  any  of 

the  H  's.  and  in  fact  is  valid  for  any  function  f(x  ,y  ). 
n  'm  m' 

Substitution  of  (19)  and  the  corresponding  eqs.  on  the  other 

H^(x^,y^)  into  eq.(15)  gives,  after  some  simplification: 


(20)  K(X  .Y  )  =  K^(X  .Y  )  +  K,(X  .Y  )  e  +  K„(X  .Y  )  e" 
'  ^  m  m^  0'^  m  m^  1'  m  m^  2'^  m  m^ 


where 


(21) 

^0 

=  «0 

(22) 

^1 

=  H,  * 

(Hg-Wi) 

(23) 

II 

+ 

2  {Hq.Wj}  * 

1  (Ki-Wj)  + 

1 

2 

{Hj.Wj} 

(24) 

II 

+ 

1 

3 

{Kg.Wj} 

(»r*2>  *  1 

{Hj.W,}  *  1 

(25) 

^4 

II 

+ 

k  <«o-»4>  * 

k  <''r*3>  * 

1 

4 

{K2.W2)  +  i  {Kj.Wj} 

+  I2  (»r’'3>  ^  k  («2-V  *  I 

^T2  {("rV’V  *1?  »Hi.W2).Wj} 
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In  eqs. (21)-(25) .  the  and  are  taken  as  functions  of  the 

variables  X  ,Y  . 

m  m 

So  we  see  that  the  method  of  Lie  transforms  is  nothing 

more  than  the  introduction  of  the  generating  equations  (5)-(7) 

into  Taylor  series  expansions  for  the  variables  (x  ,y  )  and  H. 

m  m 

However,  the  transformation  eqs.  (e.g.  (21)-(25))  can  be 
generated  much  more  efficiently  than  by  the  foregoing  expansion 
method.  There  are  several  schemes  for  doing  so  (including  the 
original  method  of  Deprit  [8]  based  on  the  "Lie  triangle"  and  a 
method  of  Dragt  and  Finn  [10]  based  on  infinite  products  rather 
than  infinite  series  ),  but  we  prefer  the  following  method  (see 
[15]).  which  is  easily  implemented  on  MACSYMA  ([13].  [14]. 
[19]). 

Define  the  operators  L  and  S  as  follows: 

n  n 


(26) 


L  =  {  .W  } 
n  n' 


(27.1) 


Sq  =  Id  (the  identity  operator) 


n-1 


(27.2) 


S  = 
n 


h  )  m  .  n  =  1.2,3. 
n  L.  n~m  m 


m=0 


Then  the  near-identity  transformation  from  (x  ,y  )  to  (X  ,Y  ) 

'  m  m''  '  m  m^ 

variables  is  given  by 


(28.1) 


X  = 

m 


Sq  +  e  Sj  +  e  S2  + 


m 


(28.2) 


^m  = 


Sq  +  t  Sj  +  e  ^ 


m 


and  the  n'"^  term  of  the  Kami  1  tonian  is  given  by  the 


expression 
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n-1 


(29)  K  =  H  +  -  {H^.W  }+-yL  K+mS  H 

n  nn'On^  n^L  n-ra  m  n-m  m 


m=l 


n  =  2.3.4. . . 


where  the  cases  n  =  0.1  are  given  by  eqs. (21) . (22) 


Coupled  Oscillators 


In  this  work  we  shall  apply  the  method  of  Lie  transforms 
to  two  degree  of  freedom  Hamiltonian  systems  in  which  Hq  has  the 

special  form: 

u  1/2^  2  2.  1,2^  2  2. 

(30)  Hq  =  ^  (Pj  +  Qi  )  -  2  (P2  “2  **2  ^ 


where  and  p^  are  variables  representing  the  displacement  tind 

momentum  of  oscillator  m.  For  e  =  0.  the  equations  of  motion 
corresponding  to  such  a  Hamiltonian  become 

(31)  and  ,  or  =  0. 

Thus  when  e  =  0.  the  system  has  eigenvalues  +  i  +  i  Ug. 
where  i  =  >^T.  and  we  change  variables  to  eigencoordinates 


(32) 


w  y 
m  m 


X  y 

m  ^  .  •'m 

Sn  ~  <j  2  ’  ^ro  ~  2 

m 


+  i  X 


m 


for  which  the  eqs.  of  motion  (31)  and  Hamiltonian  (30)  take  the 
form 


(33) 


X  =  i  (»)  X  and  y  =  -  i  u  y 
m  mm  ■'m  m  •'m 
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(34) 


Hq  =  i  0,1  Xi  Yi  -  i  0,2  X2  y. 


In  these  coordinates,  each  becomes  a  polynomial  of 
degree  n+2  in  the  four  variables  Xi,yi.X2.y2-  f^or  example, 
there  are  20  cubic  monomials  which  form  a  basis  for  Hi: 


3  2  2  2  2 

(35)  Hi  =  linear  combination  of  (Xj  ,Xi  X2.X1  Yi-^i  y2.XiX2  . 


2  2  3  2  2  3, 

^2^1  '^l^2'^7.  *^1  *^1  ^2'^l^2  -^2  > 


The  number  of  basis  monomials  for  H2.  H^  and  H^  are: 


Term 

»1 

H2 

«3 

H. 


Degree  No. of  basis  monomials 

3  20 

4  35 

5  56 

6  84 


We  now  come  to  the  question  of  how  to  choose  the 
generating  functions  so  as  to  best  simplifiy  the  Kamiltonians 

K^.  At  the  n*'^  step  of  the  method.  is  given  by  eq.(29). 


(36) 


K  =  —  {H„.W  }  +  terms  which  are  already  known 

n  n  ^  0  n^ 


Now  with  Hq  in  the  simplified  form  (34), 
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(37) 


aH-  aw  an.  aw  an^  aw  an^  aw 

^  y _ 0  n _ 0  n  0  n  0  n 

"  aXj  aVj  aVj  aXj  aXg  aVg  "  aVg  ax2 


r  aw 

aw  1 

r 

aw 

aw  1 

Iy  — a- 

X  — ^ 

-  ^  "2^ 

n 

L  1 

1  3XjJ 

2aY2- 

We  want  to  choose  W  so  that  this  linear  partial  differential 

n 

operator  on  W^  cancels  as  many  terms  as  possible  in  eq.(36). 

Each  term  to  be  cancelled  will  be  of  the  form 

(38)  A  X2*‘ 

where  A  is  a  constzint.  In  view  of  the  linearity  of  (37),  we 

choose  W  to  be  a  sum  of  terms,  one  for  each  term  (38)  to  be 
n 

cancelled,  of  the  form 

(39)  W^  =  B  Xjj  Yj^  Xg**  Yg® 

where  B  is  an  undetermined  constant.  Then 

S  =  5  ‘  ®  ’‘l’’  =‘2''  V 

leading  to  the  choice 

(41)  B  =  ..  ^  ^  , - r  .  n  =  j+l+r+s-2. 

Note  that  this  scheme  fails  if  the  denominator  of  (41) 
vanishes.  Assuming  that  the  frequencies  Uj  and  are 

incommensurable  (nonresonant),  the  denominator  will  vanish  only 
if  both 

(42)  1  =  j  and  s  =  r 
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Thus  we  cannot  remove  terms  of  the  form 


(43)  (Xj  (X2  ■ 

This  means  that  we  can  always  reduce  every  such  (nonresonant } 
problem  to  the  form: 

(44)  Kq  =  Hq  =  i  (Jj  (XjYj)  -  i  (^2  (X2Y2) 

(45)  Kj  =  0 

(«)  ^2  =  «2200  *  ‘‘an  (Vl»V2)  *  '<0022  (^2)^ 

(47)  K3  =  0 

'^4  '  '^300  *  '^n 

*  ■<n22  (>‘l''l)(V2)^  *  'Ws  (V2)® 

That  is,  every  such  nonresonant  two  degree  of  freedom  problem 
can,  to  0(4),  be  reduced  to  only  7  coefficients.  Note  that  in 
this  case  the  resulting  1 tonitm  is  a  function  only  of  the 
"action"  variables, 

(49)  ^1  =  '  ^1  ^1  ^2  =  ^  ^2  ^2 

and  hence  both  coordinates  are  ignorable  and  the  system  is 

immediately  solveable  to  0(4).  Such  a  system  is  said  to  be  in 

Birkhoff  normal  form  ([5],  p.85). 

By  inspection  of  eq.(41),  the  foregoing  scheme  fails  at 

special  resonant  values  of  u,  arod  {•)„.  In  solving  for  W  , 

L  z  n 

resonant  terms  occur  for  integer  values  of  and  k2  such  that 
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(50) 


kj  Wj  +  k2  «2  =  0  •  Ikj|  +  |k2l  ^  n  +  2 


In  such  cases  additional  non-removable  terms  occur.  We  shall 
not  consider  such  resonant  cases  in  this  work. 

Computer  Aleebra 

The  computation  just  described  turns  out  to  involve  vast 
q’lantltles  of  algebra.  We  used  the  computer  algebra  system 
MACSYMA  ([18])  in  order  to  do  the  computation  more  accurately 
and  more  efficiently  than  by  hand.  For  example,  the  key 
formulas  (12) , (26) . (27) . (29)  can  be  represented  in  MACSYMA  via 
the  following  lines  of  code  ([7],  [19])= 

P0ISS0N(F.G):= 

Sl]M(DIFF(F.X[I])wDIFF(G.Y[l])-DIFF(F.Y[I])»*DIFF(G.X[I])  .  1. 1  .N)$ 
L( I . F) : =P0ISS0N(F . WC I ] )$ 

S(I.F):=(IF  1=0  THEN  F  ELSE  SUM(L(I-M.S(M.F)) .M.O, I-l)/I)$ 

K[I] : =(H[I]+P0ISS0N(H[0] .W[I])/I 

+SUM(L(  I-M, K[M] )+M»«S( I-M. H[M] ) . M .  1 . 1-l  )/I )$ 

In  order  to  efficiently  compute  W^  by  the  formulas 

(39), (41).  we  use  the  MACSYMA  tool  called  pattern  matching.  A 
rule  named  WSOLVE  is  defined  as  follows: 


LET(X1  JxYl  ^L»«X2^R»«Y2''S . 

Xl"JwYl''L»«X2''R»*Y2^S»«I»«N/(Wl»(L-J)-W2«(S-R)  ) .  WS0LVE)$ 

.  I  r  s  ‘ 

That  is.  replace  the  term  Xj-'Yj‘X2  by  ■ 

When  WSOLVE  is  applied  to  the  "terms  which  are  already  known"  on 
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the  right  hand  side  of  eq.(36),  the  correct  expression  for  is 

automatically  generated.  Note  that  this  rule  is  not  applied  to 
non-removable  terms  of  the  form  (43). 

One  could  hope  to  simply  apply  these  formulas  to  the 
problem  at  hand,  and  to  thereby  automatically  obtain  the 
transformed  Kamiltonian.  Unfortvinately.  the  size  of  the  0(4) 
computation  is  too  large  to  proceed  directly;  HACSYMA  on  a 
Symbolics  3670  runs  out  of  space.  E.g.  from  eq.(25)  we  see  that 
the  computation  of  involves  the  evaluation  of  the  quantity 

{{{Hj .  Wj}  ,Wj}  .Wj^} .  The  innermost  Poisson  bracket  involves  20 

terms  for  and  20  terms  for  Wj.  i.e.  400  pairs  which  can  be 

collected  together  into  35  terms  (since  there  are  35  fourth 
degree  basis  monomials).  These  then  need  to  be  combined  with 
the  20  terms  of  in  order  to  evaluate  the  second  Poisson 

bracket,  i.e.  700  pairs  which  combine  together  into  56  terms. 
Next  the  third  Poisson  bracket  combines  the  previous  result  with 
the  20  terms  of  to  require  the  computation  of  1120  pairs. 

which  may  be  collected  together  into  84  terms. 

In  order  to  complete  the  computation,  we  broke  it  up 
into  pieces,  each  of  which  was  sufficiently  small  so  as  not  to 
cause  NACSYNA  to  encounter  space  problems.  We  shall  refer  to 
our  strategy  for  treating  such  large  computations  as  the  method 
of  telescoping  compositions.  As  an  example  of  this  strategy,  we 
once  again  consider  the  computation  of  the  triple  Poisson 
bracket  .W^}  .W^}  ,Wj^}  .  We  first  compute  {H^.W^}  and  store 


the  resulting  35  coefficients  in  a  disk  file.  Next, 

instead  of  computii^  {{Hj^,Wj},Wj},  we  compute  instead  {A.Wj^}, 


where  A  is  a  dummy  polynomial  with  symbolic  coefficients  A. 


Irs' 


Although  we  are  eventually  Interested  in  identifying  these 
coefficients  with  those  we  have  stored  in  a  disk  file,  we  save 
that  step  for  later.  We  store  the  resulting  56  coefficients 
Bjirs  of  {A.Wj}  in  a  disk  file.  Next  we  compute  {B.Wj^},  where 
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now  B  is  a  dumny  polynomial  with  symbolic  coeffcients 

This  results  in  84  coefficients  which  are  known  in  terms  of  the 

B.,  coefficients.  The  latter  are  stored  in  a  file  and  are 
jlrs 

known  in  terms  of  the  A.,  coefficients,  which  are  also  stored 

jlrs 

in  a  disk  file.  At  this  point  the  computation  of 

{{{Hj .Wj} .Wj} ,Wj}  is  complete,  although  it  still  remains  to  plug 


the  values  of  the  A.,  and  B,,  coefficients  into  the  final 

jlrs  jlrs 


result. 


For  a  complete  listing  of  the  programs,  see  [7]. 


Results 


The  results  of  this  work  take  the  form  of  expressions 

for  the  transformed  Kamiltonian  K  in  terms  of  the  original 

Hamiltonian  H.  If  we  express  H  in  x  .y  eigencoordinates 

m  m 

defined  by  eqs.(32).  then  Hq  takes  the  canonical  form  (34).  and 
the  polynomials  of  eq.(4)  can  be  written  as 


“n  =  I  «Jlrs 


n  =  j+l+r+s-2 


where 

K., 

jlrs 

(51) 


the  H,,  are  given  constants, 
jlrs 

in  K2  in  eq.(46)  are  given  by: 

^2200  ^  ^00 


Then  the  coefficients 


-  ^1101  ^1110  -  ■^^1200  ”2100  ^0300  ^000^ 

<^2  <^1 


”0210  ^2001 


^1-^2 


H, 


0201 


^2010  . 
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(52) 


The  comparable  coefficients  in  in  eq.(48)  were  also 

found,  but  cannot  be  displayed  here  because  they  are  too  long. 
E.g. ,  the  ASCII  files  for  K33QQ  and  ^0033  164K 


characters,  while  those  for  and  K22I] 


contain  468K 


characters.  These  expressions  simplify  greatly,  however,  in  the 
special  case  in  which  and  are  identically  zero.  Since 


this  special  case  occurs  in  frequently  in  sample  problems,  we 
give  the  associated  coefficients  of  here: 
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^Saoo  “  ^*3300 


(55) 


1  ”0301  ^10  i  **1201  **2110  i  **1210  **2101 


"2  ■  ^  "l  "2  ■  “l  "2  "l 

1  **0310  **3001  4  1  **0400  **4000  4  i  **1300  **3100 

- +  - +  - 

{1)2  +  3  Wj  Uj 

*^2211  ^  **2211 

9  i  **0301  **3010  i  (3  **1201  +  2  **0112)  **2110 


"2  ■  ^  “1  “2  ■  "1 

9  1  **0310  **3001  1  (3  **1210  -  2  **0121)  **2101 

+  - +  - 

Wg  +  3  Uj  ^2  *  *^1 

2  i  **0202  **2020  2  1  **1021  **1201  2  i  **0220  **2002 


"2  ■  "1 


‘"2-"l 


"2  *"1 


2  i  **1012  **1210  2  i  **1102  **1120  3  1  **0211  **3100 

- +  - 


“2  “1 


(tl 


1 


3  i  “1300 


<0 


1 
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(56)  K 


(57) 


1122  ■  ^1122 

9  1  ”0103  ^1030  i  (3  ^0112  +  2  ^1201)  ^1021 


3  0)2  -  “i  *"2  "  "l 

9  1  ^0130  ”1003  1  (3  ^0121  -  2  ^1210)  ^1012 


3o)2  +  o)j  0)2  +  o)j 

2  i  ”0202  ^020  2  1  **0112  **2110  2  i  **0220  **2002 


0)2  -  o)j 


“2  -  *"1 


0*2  +  o,j 


2  i  **0121  **2101  2  i  **0211  **2011  3  i  **0013  **1120 


0)2  +  o)j 

3  i  **0031  **1102 

“1 

“2 

*^0033  ^  **0033 

i  **0103  **1030 

i 

**0130  **1003 

3  0)2  -  o)i 

3  0)2  +  0)^ 

i  **0121  **1012 

+  - 

4 

i  **0004  **0040 

0)2  +  0)^  0)2 


0) 


2 


**0112  **1021 


4  i  **0013  **0031 
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Arnold’s  Theorem 


We  are  interested  in  applying  the  previous  results  to 
the  determination  of  the  stability  of  the  equilibrium  at  the 
origin  in  a  system  of  two  nonlinear  coupled  oscillators  in  which 
has  the  form  (30).  Note  that  the  linearized  Hamiltonian 

differential  equations  (1)  corresponding  to  H  =  Hq  have  purely 

imaginary  eigenvalues,  and  thus  are  inconclusive  regarding 
stability.  Moreover,  because  of  the  minus  sign  in  (30),  is 

not  positive  definite,  and  Lyapunov’s  direct  method  [7]  cannot 
be  used  to  determine  stability. 

For  such  cases,  stability  may  be  determined  by  appealing 
to  a  theorem  of  Arnold  [4],  which  has  been  restated  and  reproved 
by  Meyer  and  Schmidt  [16].  The  theorem,  based  on  the  existence 
of  invariant  tori  in  KAM  theory  [3],  gives  sufficient  conditions 
for  stability  in  nonresonant  systems,  in  terms  of  the 
transformed  Hamiltonian  K(Ij^,l2)  which  has  been  put  in  Birkhoff 

normal  form,  cf.(49).  The  terms  of  eqs.  (44)-(48)  are  thought 

of  as  functions  of  1^  and  I2.  K^(Ij,l2)*  The  theorem  involves 

quantities  defined  by 

(58)  °n  =  V"2’"l5’ 

From  (44)-(48),  the  first  two  non-identical ly  zero  are  D2 

and  D. : 

4 

(59)  ^2  ~  ~  ^^2200  “2  ^1111  “1*^2  ^0022  “l  ^ 

(60)  =  1  (K3300  *  K22„  -  Kjj22  “iS  *  W 

Arnold’s  theorem  states  that  the  origin  is  stable  for 
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those  pareuneter  values  for  which  Dg  0.  In  the  case  that 

^2  =  0,  stability  is  assured  if  ^  0,  and  so  on.  I.e..  the 

origin  is  stable  if  02^^  /  0  for  some  n. 

Using  the  expressions  (51)-(57)  for  the  coefficients 
expressions  for  D2  and  (the  latter  in  the  special  case 

that  Hj=H2=0)  may  be  obtained: 


(61)  ^2  -  -  (<^2  *^2200  “l“2  ^1111  “l  ^0022^ 


+  1  [<^2  Hjjqj  *^1110  "1  ^1011  ^0111 


+  2  <Ji  (Hjjjq  Hqqj2  +  ^1101  ^0021^ 


2  “2  ^”0111  ”2100  “1011  ”1200^ 


*  0)2  ^^0003  ^0030  *  ^0021  ^0012^ 


^|2 

o)j  ^*S000  ^0300  *  ^100  ”1200^ 


4  0)2  +  Wj  4.  ^2  ~ 

*  "1  ”1020  **0102  *  Z  ^  "1  **1002  **0120 


2  0)2  “  Wj 


2  (1)2  +  Uj 


4  Uj  +  <^2  4  Uj  -  0)2 

"2  **2010  **0201  "  7  ;  "2  **2001  **0210  ^ 


2  <i)j  -  <1)2 


2  Cl)  1  +  0)2 


(62)  -  i  (0)2  H33QQ  +  <^2  H2211  +  ^2“!  **1122  "l  **0033) 


“  2  <^1  "2  ^**1102  **1120  *  **0211  **2011  **0202  **2020 
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*  ^1021  ^1201  *  ^1012  ^1210  *  ^220  *^2002 


**0112  **2110  *  **2101  **0121^ 


-  3  (0, 


'2  ^**3100  **0211  *  **1300  **2011^ 


-  3  a?!  (Hqqj3  Hjj2o  ■*■  ^0031  **1102^ 
3 


“  cjg  ^**0004  **0040  *  **0013  **0031^ 


Cl), 


^  ^**4000  **0400  ■*■  **1300  **3100^ 

"2  **1201  **2110  ^"2  +  3  "i)  "2^  ^1210  **2101  ^*^2  “  ^ 


"2  ■"  “l 


0)2  -  <^1 


"l  **1021  **0112  ^"l  *  ^  "2^  "l  **1012  **0121  ^"l  "  ^  “2^ 


"l  +  "2 


“l  -  ‘^2 


"2  **0301  **3010  ^"2  +  Q  "i)  "2  **0310  **3001  ^^2  "  ^  "P 


0)2  +  3  o)j 


0)2  -  3  o)j 


^1  **0103  **1030  ^“1  ^  ^2^  _  ^1  **0130  **1003  ^‘^l  ~  ^  ^2^ 

o)i  +  3  0)2  o)j^  -  3  0)2 


(assumes  =  0) 


The  expression  for  in  the  general  case  is  too  long  to 

be  included  here,  but  is  available  on  our  computer  for  numeric^al 
evaluation. 
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Exzunple  1 


We  consider  a  variation  of  the  Henon-Heiles  Hamiltonian 
where  the  linear  oscillators  are  not  at  low-order  resonance  and 
are  of  different  signs: 

(63)  H  =  ^  (Pj  +  <0  qj  )  -  2  (P2  ^2  ^  ^^l  ^2  "  3  *^2 


Using  the  transformation  to  eigencoordinates  given  by  eq.(32),  H 
becomes 


(64) 


H  = 


1  3  ^  i  3  i  2 

1  <0  XjYj-  i  X2y2  -  3  X2  ^  ^ 

2  1  2 


2  w 
1  2 


2 

1  2 


+  ^  ^1%  -  I  Vy2  -  i  ^1  ^  §  ^1  ^2  ^  i  ^2^2 

(J 


+  -  X 


^i^i 


2^  ^iyiy2 


(0  >  0 


Then  using  eqs. (51)-(53) ,  we  find  the  K2  coefficients  to  be 


(65)  K, 


3  -  8  0) 


2200  ^  2  ,  .  2 

4  oj  (4  (1)  -  1) 


(66)  = 


4  (j^  +  1 
<0  (4  -  1) 


(67)  K, 


0022  ■  12 


Using  eq.(61).  we  find  D2  to  be 


(68)  D2  = 


20  -  53  +  12  0)^  -  9 

12  (j^  (4  (/  -  1) 


We  find  that  D„=0  only  for  u  =  u  2:  1.5752078...  In  order  to 

z  c 

determine  the  stability  of  the  origin  for  u  =  we  must 
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consider  the  condition.  Because  H  contains  only  cubic 

nonlinear  terms  and  because  each  cubic  coefficient  is  simple,  we 
are  able  to  find  the  expression  for  algebraically.  The 

coefficients  for  K.  turn  out  to  be 

4 


i  (1024  +768  -1632  +596  ^  -51) 

48  <0^  (2  u  -  1)^  (2  fc)  +  1)^ 


„  _  i  (384  -288  -H6  -340  -USQ  -6) 

^2211  ■  A  ^  2.  1x3  ^  ia3 

4  w  (1  -  <i)  )  (2  (i)  -  1)  (2  0)  +  1) 


(71)  Kji22= 


(72)  Kqq33  -  "  432 


i  (4  <j^  +  1)(320  -480  +360  -161  +6) 

12  i?  (<j^  -  1)  (2  u  -  1)^  (2  cj  +  1)^ 


235  i 


and  D .  becomes 
4 

(73)  =  (15040  -72400  +113172  -77935 

+14491  <0®  -10188  u®  -3096  +5175  -459) 


432  u®  (cj^  -  1)  (2  w  -  1)®  (2  (j  +  1)® 


So,  at  (j  =  u  ,  D.  =  -0.19180289...  ^  0. 
c  4 

Thus  by  Arnold's  theorem,  the  origin  is  nonl inear ly 
stable.  We  note  that  this  result  does  not  apply  to  a  small  set 
of  resonant  values  of  u  which  correspond  to  vanishing 
denominators  in  the  algorithm  (41).  From  eq.(50)  with  n  =  2,  we 
find  the  followii^  resonant  values  of  u: 

<j  =  (i.  i.  1,  2,  3}. 
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Example  2 


This  second  example  involves  a  spinning  mass-spring 
system,  which  contains  no  odd  powered  terms  in  the  Hamiltonitin. 
Consider  4  identical  springs,  each  attached  at  one  end  to  the 
outer  rim  of  a  wheel  of  unit  r2uiius  separated  by  90®.  The  other 
end  of  each  spring  is  attached  to  a  unit  mass  which  is  free  to 
move  about  its  equilibrium  position  at  the  center,  see  Fig.  1. 
Let  the  Qj~Q2  axes  rotate  with  the  wheel  with  angular  velocity 

u  >  0  relative  to  an  inertial  frame.  Each  spring  is  unstretched 
when  the  mass  is  at  the  origin.  The  potential  energy  for 

each  spring  under  a  deflection  6^  is  taken  to  be 

(74)  V.  =  1  (  6^2  ^  ^  g_4^ 

where  the  linear  spring  constant  has  been  taken  equal  to  |  and  p 

is  a  noiil inear  spring  constant.  Then  this  system  has  the 
Hamiltonian 

(75)  H  =  I  (Pj2  *  p^2,  *  „  (p^Q^  .  p^Q^,  .  Vj  *  ^  V3  . 

where  are  momenta.  Then  upon  taking  the  Taylor  series  of  H 
about  the  origin,  H  becomes  [7] 

(76)  H  =  I  (Pj2  ^  ^  ^  ^  1  ^^2  ^  0^2 j 

+  I  [(4  p  +  1)  -  8  +  (4  p  +  1)  02^] 

■  +  4  (p  -  1)  +  4  (p  -  1)  Q^2q^4  ^  0^6^ 

=Ho+H2^H4+  ... 

Using  the  linear  differential  equations  corresponding  to  Hq.  we 
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Fig.  1.  Exzunple  2  involves  a  spinning  mass-spring  system.  The 
unit  mass  is  restrained  by  4  identical  nonlinear  springs.  The 


axes  are  fixed  to  the  wheel  and  rotate  relative  to  an 


inertial  frame. 
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find  the  characteristic  equation  to  be 

4  2  2  2  2 

(77)  X  +  2  X'^  (1  +  u  )  +  (w  -  1)"*  =  0 

which  has  eigenvalues  X  =  {±  i  ±  i  (l-H))}.  From  this  we 

conclude  that  the  equilibrium  at  the  origin  is  elliptic  for 
(j  ^  1,  i.e..  comprised  of  two  oscillators  with  frequencies  1  -  u 
and  1  +  w  in  the  first  approximation.  Then  using  a  canonical 
eigenvector  transformation  from  (Qm-V  to  (x^.  y^)  gives  [7] 

(78)  Hq  =  i  (1  -  <j)  x^Yj  -  i  (1  +  (o)  X2y2 

which  is  in  the  proper  form  for  our  analysis.  After  similarly 
transforming  H2  and  ,  we  use  eqs. (51)-(57)  to  find  that 

f791  K  —  K  —  ^  ~  ^  V  —  ^  ~  U 

^  ^200  “  0022  “  32  *  1111  “  8 

(80)  K3300  =  i  C(576  -32  p  +20)  -  (864  p^  -48  p  +30)  w 

+  272  p^  -56  p  -15] 

1024  (w  -  1)  (2  u  -  1) 

(81)  K2211  =  3  i  [(1440  p^  -48  p  +58)  (/ 

-  (720  p^  -24  p  +29)  w  -  48  p^  -120  p  -75] 

1024  w  (2  w  -  1) 

(82)  ^1122  =  ^  i[(1440  p^  -48  p  +58) 

+  (720  p^  -24  p  +29)  cj  -  48  p^  -120  p  -75] 

1024  u  (2  u  +  1) 

(83)  =  i  [(576  p^  -32  p  +20)  +  (864  p^  -48  p  +30)  u 

+  272  p^  -56  p  -15] 

1024  (u  +  1)  (2  u  +  1) 
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We  find  Dg  and  using  eqs. (61)-(62)  to  be 


(84)  D,  =  Ll2f^-  1)^(3 


(85) 


°4  = 


[(9792  11^  -352  4  +388)  u®  -  (17744  +264  fi 

+1213)  u®  +  (9104  +232  pi  +657)  -  (1392 

+216  ji  +207)  -  144  -360  4  -225] 


512  (j  (w^  -  1)  (4  G)^  -  1) 


1  2  1 

Then  D2  =  0  for  4  =  and  w  =  which  are  two  lines 
in  the  ii-u  parameter  plane.  When  D2  =  0  we  must  check  the 
condition.  Consider  the  line  4  =  The  value  of  on  this 

line  is 


n  ^  1a  60  (0®  -191  <0®  +104  -33  (/  -36 

(86)  0^(4  =  12 ^  =  ■ 


72  u  (u^  -  1)  (4  </  -  1) 


which  is  zero  only  for  u  =  u  =£  1.6241875...  Now  consider  the 

c 

2  1 

line  (j  =  =.  D.  on  this  line  becomes 
3  4 


(87)  0^(0)^  =  i)  =  ^  (336  4^  +1064  4  +661) 


which  is  zero  only  for  4  =  4^  2  “  ~  8?  (1^  ^  ^  V  238) 
(-0.84870,  -2.31796}. 


We  now  apply  the  stability  theorem.  First,  note  that 
we  consider  cj  >  0  and  that  for  gj  =  1  the  origin  is  not  elliptic 
so  that  our  analysis  does  not  apply  there.  From  eq.(50)  with 
n  =  2,  we  must  also  exclude  gj  =  {^,  2,  3}  from  the  analysis. 

Applying  the  D2  condition,  we  find  that  the  origin  is  stable 
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everywhere  in  the  ii-w  parameter  plane  except  possibly  along  the 

.  1-  1^21 

two  lines  fjL  =  ^  “3' 


On  these  lines  the  D.  condition 

4 


must  be  used.  From  eq.(50)  with  n  =  4,  we  must  now  exclude 

,1  3  2  3  5  ,  1 

(J  =  5.  5}  when  p  = 


^5*  5’  3‘  2’  3* 
origin  is  stable  provided  gj  /  <») 


Elsewhere  on  p  =  the 


on  =  ^,  the  origin  is 


stable  provided  p  ^  p^  2*  three  points  where 

D„  =  D .  =  0.  the  D_  condition  must  be  used  to  prove  nonlinear 
Z  ^  O 


stability. 

We  note  that  for  (j  <  1.  stability  of  the  origin  can  be 
independently  proved  by  Lyapunov's  direct  method  [7]. 


Application  to  the  Problem  of  Three  Bodies 


The  circular  restricted  three  body  problem  is  well-known 
to  exhibit  five  equilibria  in  a  rotating  barycentric  coordinate 
system  [20].  Lj.L2  represent  equilibrium  positions  of 

the  third  body,  in  which  all  three  bodies  are  col linear.  All 
three  of  these  are  unstable  for  all  values  of  the  mass  ratio 
parameter  p.  and  represent  equilibria  where  all  three 

bodies  sit  at  the  vertices  of  an  equilateral  triemgle.  For 
values  of  p  >  p^^  2:  0.0385208,  both  these  equilibria  are 

unstable.  For  p  <  p^^,  Alfriend  [1.2]  showed  that  the  triangular 

points  are  unstable  when  p  =  Pg  and  p^.  special  mass  ratios 

which  cause  the  linearized  frequencies  to  be  in  the  ratio  of  1:2 
and  1:3,  respectively.  For  other  values  of  p  <  p^ ,  stability  of 

(and  L^)  can  be  obtained  by  using  Arnold’s  theorem.  This  was 

first  done  by  Deprit  and  Deprit-Bartholome  [9],  who  calculated 
D2  by  hand.  The  value  they  obtained. 


(88) 


2  2  4  4 

36  -  541  Wj  ^2  +  644 

8(1-4  (4-25  ^1%^) 
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is  non-zero  for  all  values  of  ji  except  for  p  =  p  ^  0.0109136. 

c 

For  p  =  p  .  Arnold's  theorem  requires  the  quantity  D.  be  found. 

C  T 

This  computation  was  performed  by  Meyer  and  Schmidt,  who  found 
2:  -66.6.  The  non-zero  value  of  implies  stability,  by 

Arnold’s  theorem. 

In  what  follows  we  shall  apply  the  results  obtained  in 
this  paper  to  confirm  the  previous  computations  of  Deprit  and 
Deprit-Bartholome  [9]  and  Meyer  and  Schmidt  [16]. 

The  Hamiltonian  for  the  circular  restricted  three-body 
problem  about  the  equilibrium  is  : 

(89)  H  =  I  {P,2  ^  ^  -  '■2«1  -  ^ ‘>2 

Pj  P2 

where  Qg  +  1 

p/  =  *  02^  *  Ql  ♦  v5  Qj  *  1 

Expemding  in  a  Taylor  series  about  the  origin,  H  becomes  2 
where  contains  terms  of  order  n+2  and  Hq  is  given  by: 

(90)  Hg  =  i  (Pj2  *  p/)  *  PjQ^  -  P^Qj  *  1  Qj2 

-  I  02^  -  ^  (1  -  2p)  Q,Q2 
4 

Then  using  the  linearized  differential  equations  corresponding 
to  Hq,  the  characteristic  equation  for  the  system  is  fovind  to 

be: 
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(91)  ^  (1-nr^)  =  0  where  -r  =  1  -  2(i 


The  eigenvalues  X  have  positive  real  parts  for 

H  >  H.  =  _  (1  -  _ )  implying  the  system  is  linearly  unstable. 

2  9 

For  p  <  Pj.  the  system  is  critically  stable  having  eigenvalues 
iicjj  and  ii<>>2  *h®re: 


n  ^  ..  2.  2  , 

0  <  (jg  <  _  <  Qj  <  1  .  (jj  *  ^2  “  1  • 


and 


2  2  27  ,,  2. 

Up  =  —  (1-Tr  ) 
^  16 


Using  a  canonical  linear  transformation  (see  [6]),  Hq(Q^,P^)  is 
transformed  into  which  is  of  the  form  (30).  Then 

following  eqs.  (30)-(34)  we  introduce  the  variables  (Xj^^.y^jj)  and 
find  the  following  components  of  K2* 


(92) 


*^2200  " 


-<J2^(124  -  696  +  81) 

144  (2Wj2  -  if  (^2^  - 


(93) 


(94) 


K 


1111 


<0j  <^2  (64  -  64  -  43) 

6  -  if  -  41.^2)  (Uj2  _ 


^0022  “ 


<jj^(124  +  448  -  491) 

144  (2a -  1)^  ^"^2  ~  "1^^ 


and  the  first  stability  condition  becomes: 
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= 


-  644  +  1288  -  1185  +  541  -  36 

8  (2fc)j^  -  1)^  (og  -  2uj)  («2  +  2uj)  (2w2  “  "j)  (2«2  +  “i) 


which  is  equivalent  to  the  expression  (88)  found  by  Deprit  and 
Deprit-Bartholome  [9].  Then,  on  0  <  p  <  Pj.  D2  =  0  only  for: 


3  v®"  -  J  2  V199945  +  3265 


6 


2s  0.0109136 


At  this  value  of  p  =  p  .  the  components  of  K.  become: 

C  A 


(95) 

(96) 

(97) 

(98) 


*^300 

0.219259187 

i 

+ 

6.52  E-37 

^11 

-7.79324843 

i 

+ 

3.74  E-35 

\C 

ni22 

209.933620 

i 

+ 

2.35  E-34 

^0033 

14.5264460 

1 

+ 

1.75  E-34 

auid  D.  becomes: 

4 

(99)  -  66.6  -  4.27  E-36  1 

The  very  small  real  part  of  each  amd  imaginary  part  of 

results  from  taking  only  a  finite  number  of  digits  (40  in  fact) 
in  the  numerical  approximation.  Because  the  real  part  is  so 
much  larger  than  the  error  term,  the  approximation  a:  -66.6 

is  accurate  and  D.  /  0.  Hence,  at  p  =  p  ,  the  Hamiltonian 
*  c 

system  is  stable.  These  values  for  the  coefficients  of  eund 
agree  with  those  obtained  by  Meyer  and  Schmidt  [16]. 
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Abstract 


We  apply  the  method  of  averaging  to  first  order  in  e  to  the  autonomous 

system 


3 

x' ’  +  a  X  +  p  X  =  e  g(x.x’) 

This  involves  perturbing  off  of  Jacobian  elliptic  functions,  rather  than  off  of 
trignometric  functions  as  is  usually  done.  The  resulting  equations  involve 
Integrals  of  elliptic  functions  which  are  evaluated  using  a  program  written  in 
the  computer  algebra  system  MACSYMA.  The  results  are  applied  to  the  problem  of 
finding  limit  cycles  in  the  above  differential  equation. 
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Introduction 


A  limitation  of  most  texts  which  treat  nonlinear  vibration  problems  by 
perturbation  methods  is  that  most  problems  involve  perturbing  off  of  the  sine 
and  cosine  solutions  of  simple  harmonic  oscillators.  For  example,  consider  the 
nonlinear  oscillator: 


(1) 


x’ '  +  X  =  fc 


f  3^  1  .  ^  31  2  .  .31  1 

I -X  +  ^  X  +  ^  X  X  -  X  I .  wi  th  e  =  jQ 


The  usual  approach  to  studying  eq.(l)  involves  assuming  that  the  parameter  e  is 
asymptotically  small,  and  perturbing  off  of  the  associated  equation  (for  e  =  0) 


(2) 


which  has  the  general  solution 


x*  *  +  X  =  0 


(3) 


X  =  C  cos  (t  +  B) 


The  method  of  averaging  [7-9,11-16]  seeks  a  solution  to  eq.(l)  when  e  /  0  in 
the  form: 


(4) 


X  =  C(t)  cos  'P(t) 
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Variation  of  parameters  and  averaging  over  the  unperturbed  period  2ir  gives  the 
usual  formulas: 


(5.1) 


C 


sin  dyp 


2ir 

(5.2)  \p‘  =  I  +  2|c  J  ^(**^*) 


in  which  eq.(l)  has  been  written  in  the  form 

(6)  x’ •  +  X  +  e  G(x.x‘ )  =  0 

31  31  2  3 

Evaluating  eqs.(5)  with  G(x.x')  =  x  -  ^  x’  -  x  x’  +  x'  gives 

(7.1)  C*  =  I^C  (C^  +  20) 

(7.2)  =  1  +  |e 

Nontrivial  fixed  points  of  eq.(7.1)  are.  in  view  of  (4),  periodic  motions 
(limit  cycles)  of  eq.(l).  Since  the  only  fixed  point  of  (7.1)  is  C  =  0,  the 
method  of  averaging  predicts  that  there  are  no  limit  cycles  for  eq.(l).  This 
prediction  is,  however,  erroneous!  See  Fig.l  which  shows  the  results  of 
numerically  integrating  eq.(l). 

This  embarrassing  failure  of  averaging  may  be  remedied  in  two  ways. 

One  may  extend  the  averaging  process  to  second  order,  i.e.,  include  terms  of 

2 

0(fe  ).  This  involves  combining  the  avereiging  process  with  a  near-identity 
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3 


Fig.l.  Limit  cycle  of  eq.(l)  obtained  by  numerical  integration  (N).  Also 
shown  is  the  analytic  approximation  (A)  for  the  limit  cycle  obtained  by  using 
first  order  averaging  utilizing  elliptic  functions,  to  be  discussed  later,  see 
eq.('40).  Note  that  first  order  averaging  utilizing  trigonometric  functions 
fails  to  predict  a  limit  cycle  in  this  case,  cf.  eq.(7.1). 
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trems  format  ion  of  dejjendent  variables.  This  route  has  been  treated  in  [14], 
and  computer  algebra  (MACSYMA)  programs  have  been  presented  there  to  automate 
the  process.  Alternatively,  one  may  stay  with  first  order  averaging,  but 
follow  the  path  presented  in  this  paper. 

In  this  paper  we  treat  a  class  of  problems  which  involve  perturbing  off 
of  Jacobian  elliptic  functions.  We  consider  the  differential  equation 

(8)  X* •  +  a  X  +  p  +  £  g(x,x*)  =0,a>0,p>0 

in  which  p  is  not  assumed  to  be  a  small  quantity.  The  unperturbed  system  is 

(9)  x“  +  a  X  +  /3  x^  =  0 
which  has  the  general  solution 


(10) 


X  =  C  cn(u,k). 


u 


t  +  u 

o 


and 


2(a  +  /3  C^) 


where  cn(u,k)  is  a  Jacobitin  elliptic  function.  We  use  the  method  of  averaging 
implemented  on  MACSYMA  to  treat  this  type  of  problem.  We  compare  results  found 
using  elliptic  functions  with  those  found  using  trigonometric  functions.  In 
particular  we  will  return  to  eq.(l)  later  in  this  paper. 

Although  the  method  of  averziging  has  been  treated  in  numerous 
references  (e.g.  [7-9,11-16]),  each  deals  almost  exclusively  with  perturbations 
off  of  the  simple  harmonic  oscillator,  eq.{2).  A  few  authors  have  considered 
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perturbations  off  of  nonlinear  systems  using  elliptic  functions.  Kuzmak  [10] 
looked  for  periodic  solutions  in  eq.(8)  using  a  multiple  scale  method,  where  a 
and  P  are  slowly  varying  parauneters.  Chirikov  [3]  studied  resonance  overlap  in 
multiple  harmonic  excitations  of  eq.(8).  Davis  [4]  investigated  second  order 
ordinary  differential  equations  using  elliptic  functions.  Cap  [2]  applied  the 
method  of  avereiging  to  perturbations  of  the  mathematical  pendulum.  Greensjjan 
and  Holmes  [6]  and  Guckenheimer  2ind  Holmes  [7]  have  applied  the  Melnikov  method 
to  perturbations  of  eq.(8)  where  a  <  0.  Nayfeh  [12],  Kevorkian  &  Cole  [9]  euid 
Sanders  &  Verhulst  [15]  have  also  treated  such  problems.  In  most  of  these 
references  the  authors  have  reduced  the  problem  to  the  evaluation  of  integrals 
which,  through  complicated  algebraic  manipulations,  may  often  be  expressed  in 
terms  of  standard  elliptic  integrals.  By  using  MACSYMA,  we  have  been  able  to 
treat  a  large  class  of  problems  by  efficiently  evaluating  the  associated 
integrals. 

We  begin  with  a  brief  review  of  elliptic  functions.  Then  we  present  a 
general  treatment  of  averaging  to  systems  of  the  form  of  eq.(8),  and  finally  we 
apply  the  method  to  the  problem  of  finding  limit  cycles  in  eq.{8). 

■Jacobian  Elliptic  Functions 

Jacobian  elliptic  functions  involve  a  collection  of  identities  which 
are  similar  to  those  for  trigonometric  functions  but  are  more  complicated 
algebraically.  The  use  of  computer  algebra  makes  manipulation  of  these 
identities  easier,  permitting  investigations  to  proceed  on  problems  which  were 
previously  avoided  because  of  the  quantities  of  algebra  involved.  All  formulas 
and  conventions  concerning  Jacobiein  elliptic  functions  in  this  paper  are  taken 
from  Byrd  amd  Friedman's  Handbook  of  Elliptic  Integrals  for  Engineers  and 
Physicists  [1]. 
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We  now  offer  a  brief  comparison  of  elliptic  functions  with  the  more 

familiar  trigonometric  functions.  Corresponding  to  sin(u}  euid  cos(u)  are  three 

fundamental  elliptic  functions  sn(u,k),  cn(u.k).  and  dn(u.k).  Each  of  the 

elliptic  functions  depends  on  the  modulus  k  as  well  as  the  argument  u.  These 

reduce  to  sin(u).  cos(u),  and  1  respectively,  when  k  =  0.  The  sn  and  sin 

functions  share  conmon  properties  as  do  cn  and  cos.  These  are  suimiarized  in 

Table  1.  The  dn  function  has  no  trigonometric  counteri>art.  Note  that  the 

elliptic  functions  sn  and  cn  may  be  thought  of  as  generalizations  of  sin  and 

cos  where  their  period  dspends  on  the  modulus  k. 

The  argument  u  is  identified  as  the  incomplete  elliptic  integral  of  the 

first  kind  which  is  usually  denoted  F{0,k).  This  identification  shows  that  u 

also  depends  on  k.  The  value  of  k  normally  ranges  from  0  to  1 .  The  sn,  cn, 

2 

and  dn  functions  are  shown  in  Fig. 2  for  k  =  1/2. 

Table  1 

Function  f 


Pr oner tv 

snfu.kl 

sin(u) 

cnfu.k) 

cosful 

dnfu.k) 

Max.  value 

1 

1 

1 

1 

1 

Min.  value 

-1 

-1 

-1 

-1 

(l-k2)l/2 

Period 

4  K(k) 

2ir 

4  K(k) 

2nr 

2  K(k) 

Odd/Even 

Odd 

Odd 

Even 

Even 

Even 

df/du 

cn  dn 

cos 

-sn  dn 

-sin 

,2 

-k  sn  Cl 

f|  k=0 

sin 

sin 

cos 

cos 

1 

K(k) 

=  complete  elliptic 

integral  of 

the  first 

kind 

K(0)  =  ir/2.  K(l)  =  +  <» 
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Fig. 2.  Comparison  of-  elliptic  functions  for  k  =  1/2  with  trigonometric 

functions.  The  period  of  the  elliptic  functions  is  4  K(k^  =  1/2)  as  7.416 
Table  1. 


The  elliptic  functions  also  satisfy  the  following  identities  which 
2  2 

correspond  to  sin  +  cos  =  1: 

(11.1)  sn^  +  cn^  =  1 

(11.2)  sn^  +  dn^  =  1 

(11.3)  1  -  cn^  =  dn^ 

The  Unperturbed  Solution 


We  shall  consider  unperturbed  systems  of  the  form  of  eq.(9).  We  find 
the  general  solution  by  assuming  the  solution  in  the  form 

(12)  X  =  C  cn(A  t  +  B.k)  =  C  cn(u.k)  =  C  cn 

where  the  argument  is  omitted  for  brevity  and  where  A  and  C  are  positive 
constants.  Substituting  (12)  into  eq.(9)  we  find 

(13)  [C  A^  (2k^  -  1)  +  a  C]  cn  +  [C^  P  -  2k^  A^  C]  cn^  =  0 
where  we  have  used  the  relation 

^ cn(u,k)  _  2  2  3 

(14)  - =  cn  =  (2k  -  1)  cn  -  2k  cn 

dn 
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For  nontrivial  solutions  (C  /  0) ,  we  find 


(15.1) 

2  2 

(1  -  2k"‘)  =  a 

(15.2) 

0  =  2k^ 

We  define  three  separate  cases  depending  on  the  parameters  a  atnd  /5. 


Case  I:  a  ^  0.  0=0 


From  eqs.(15)  we  find  that 


(16.1) 


C  is  undetermined 


which  is  the  correct  solution  for  the  harmonic  oscillator. 

Case  II:  a  ^  0,  0^0 

From  eqs.(15)  we  find  that 

2  2  (a  +  /3  (T) 

2  2  1 

of  k  is  0  <  k  <  ^.  We  note  that  cases  I  amd  III  are 

2 

that  are  recoverable  by  setting  k  equal  to  zero  or 


(16.2)  k^  = 

For  C  >  0.  the  range 
limiting  cases  of  II 
one-half . 
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Case  III:  a  =  0.  P  ^  0 

From  eqs.(15)  we  find  that 

(16.3)  =  I  .  =  p 

The  oscillator  is  purely  nonlinear  in  this  instance. 

In  all  three  cases,  the  origin  is  a  center  and  the  (x.x*)  phase  space 
is  filled  with  periodic  orbits.  For  cases  II  and  III,  the  period  of  an  orbit 
depends  on  its  amplitude,  see  Table  2.  As  the  amplitude  of  the  vibration 
approaches  the  origin  (C  ^0).  the  period  of  oscillation  increases  to  the  value 
T  s  2:w/yJa  ,  which  becomes  infinite  for  case  III. 

Table  2 

Period  T 

2  V 

V? 

^  K(k(C)) 

>1  a  +  j3 

4  K(k^=l/2) 

C 


Case 


II 


III 
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Since  eq.(9}  is  Hamiltonian,  it  possesses  the  first  integral 


IT  1  .2^1  2^1„4 

(17)  H  =  ^x+^ax+^px 


which  provides  another  method  for  solving  eq.(9).  We  define  action-angle 
variables  (J.<p)  for  this  Hamiltonian  [5.7]  in  order  to  provide  more  "natural" 
variables  to  be  used  later  in  setting  up  the  averaging  scheme.  After  some 
lengthy  calculations,  we  find  that 


(18.1) 

J  =  J(C) 

(18.2) 

4  K(k)  <#.=At  +  B  =  u 

For  simplicity,  we  take  the  variables  (C.<p)  as  primitive. 

It  is  interesting  to  note  why  the  variable  <p  is  preferred  to  B  in 
deriving  the  averaging  scheme.  First  note  that  although  each  orbit  in  phase 
space  is  orbitally  stable  [8.11.16].  it  is  Lyapvinov  unstable.  This  is  because 
the  frequency  of  an  orbit  depends  on  its  amplitude,  and  motions  starting  close 
together  but  on  two  different  orbits  eventually  become  far  apart  (in  fact,  out 
of  phase),  even  though  their  orbits  are  close. 

In  the  next  section  we  derive  the  equations  governing  averaging  based 
on  the  variables  (€,</>).  cf.  eqs.(5)  for  the  simple  harmonic  oscillator.  In  a 
similar  fashion  we  could  attempt  to  derive  comparable  equations  based  on  using 
the  phase  B  or  the  argument  u  of  the  unperturbed  solution  (cf.eq.{12))  instead 
of  the  angle  variable  ip.  In  doing  so  for  (C.B),  we  would  obtain  equations  of 
the  form  (before  averaging): 

(19)  C*  =  6  fj(C.At+B).  B’  =  -  A't  +  e  f2(C.At+B) 
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in  which  fg  turns  out  not  to  be  periodic  in  t.  The  method  of  averaging 
requires  that  the  variational  equations  be  periodic.  Thus  eqs.(19)  are 
unsuitable  for  averaging.  The  orbital  stability  of  the  solutions  is  reflected 
in  the  variational  equation  for  C  (or  equivalently  in  the  variational  equation 
for  A,  since  A  and  C  are  related  algebraically.  cf.(16.2)).  The  Lyapunov 
(phase)  instability  is  reflected  in  the  variational  equation  for  B. 

Similarly,  choosing  (C.u)  as  primitive  variables,  u  =  At+B,  gives 

(20)  C‘  =  £  fj(C,u),  u’  =  A  +  e  f2(C,u) 

in  which  is  not  periodic  in  u,  so  that  eqs.(20)  are  eigain  unsuitable  for 
averaging. 

However,  settir^  u  =  4  K(k(C))  <p.  cf.(18.2),  gives 

(21)  C*  =  e  f^(C.4  K  «p).  e  >p) 

in  which  both  f  and  (2  are  found  to  be  periodic  in  >p  emd  hence  in  the  correct 
form  for  aversiging.  Thus  the  unperturbed  solution  can  be  written  as 

(22.1)  X  =  C  cn(4  K  «#>.k) 

(22.2)  x’  =  C  A  cn‘(4  K  ip.k)  =  C  J  a  +  P  cn'(4  K  <f,  k) 

(22.3)  K  =  K(k).  k  =  k(C).  cn’  s 
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which  can  be  viewed  as  a  generalized  van  der  Pol  transformation  from  (x.x')  to 
(C.<p).  In  this  way,  (C.<p)  constitute  "natural"  variables  because  they  take 
intc  account  the  change  of  period  occurring  from  orbit  to  orbit  in  the 
unperturbed  flow. 

Variation  of  Parameters 


In  order  to  obtain  a  solution  to  eq.{8)  when  a  ^  0.  we  vary  the 
parameters  (C,<p)  so  that  C  =  C(t)  and  <p  =  <p(,t)  in  eqs.(22).  Differentiating  x 
in  (22.1)  and  equating  the  result  to  (22.2),  we  obtain 


(23)  ^  (cn  + 


C  cn’  4  K‘  k'  + 


c2^k) 


+  C 


cn 


k4^  = 


C  A  cn' 


where  primes  denote  differentiation  with  respect  to  the  argument  (the  first 
argument  in  the  case  of  cn) .  Differentiating  eq.(22.2).  we  find 


(24) 


x'  ’  =  ^  C(A  +  A'  C)  cn'  +  4  C  A  K*  k’  cn*  ’  +  C  A  k' 


ak 


+  4  C  A  K  cn"  ^ 


We  substitute  eqs.(24)  and  (22.1)  into  (8)  and  solve  for  dC/dt  and  d<p/dt.  The 
result  can  be  written  in  matrix  notation  as 

C  A  cn’ 

-eg-oCcn-pC^  cn^ 


(25) 


dC 

w 

dt 

dip 

dt 
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In  solving  these  equations,  we  need  to  compute  the  determinant  of  W: 


(26)  det[W]  =  4  C  K  A  (cn  cn' •  -  cn*^)  -  4  K  A’  cn'^ 


+  4  A  K  k*(cn' 


den 

dk 


cn' 


den'  X 
dk  ^ 


=  -4KCA  =  -4KC{a  +  /3 


where  we  have  used  eq.(14),  eq.(16.2).  and  the  identities  [1]: 

(27.1)  cn'^  =  (1  -  cn^)  (1  -  k^  +  k^  cn^) 

.  9cn'  Id,  ,2..  Id,,,  2^  ,,  ,2  .  ,2  2^, 

(27.2)  cn  )  =  2  dk^^^  "  ^  ^ 

Note  that  the  determinant  of  W  is  independent  of  <#>.  We  then  solve  eq.(25)  to 
f  ind 


(28.1) 


dC  f  ^  R  o2a-1/2 

—  =  -e  g  (a  *  P  CT) 


cn 


(28.2)  ^  ^  (a  +  P  c2)l''2  ^  ^  ^  ^  ^  c2^-^/2 


^  ®  4  K  C 


cn 


- — — ^  fz  cn'  +  - ^  ^  o  cn  (1  -  cn^)]] 

2a  +  PCr^  2(a  +  pCr) 


where  Z  =  Z(4  K  <p.k)  denotes  the  Jacobi  Zeta  function  (an  odd  2  K  periodic 
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function  with  zero  mean)  and  all  arguments  are  4  K  <f>.  In  eq.(28.2).  we  have 
used  [1]: 


(29.1)  - 2!1_  [(i-k^)  4  K  V  -  E(4  K  ^.k)] 

k  (1  -  k'^) 

- - — ^  cn  (1  -  cn^) 

(1  -  k^) 

(29.2)  Z(4  K  <#).k)  =  E{4  K  f.k)  -  4  9  E 

where  E(4  K  «p.k)  is  shorthand  notation  for  E(0.k),  the  incomplete  elliptic 
integral  of  the  second  kind  (where  0  =  am(4  K  <p.k)  and  am(u,k)  is  the  elliptic 
amplitude  function  [1])  and  E  =  E(k)  denotes  the  complete  elliptic  integral  of 
the  second  kind. 

We  consider  eqs.(28)  in  the  three  cases  I, II. Ill  separately: 


Case  I : 


(30) 


=  a.  K(k=0)  =  Z(u.k=0)  =  0 


(31.1) 


^  _ 

dt 


e  —  g  cn'(2  IT  <p,k=0)  =  e  —  g  sin(2  ir 


(31.2) 


d<p  _ 

dt  “  2  IT 


e 


_ l_ 

2  TT  C 


—  g  COs(2  TT  ip) 


which  agrees  with  the  perturbation  equations  off  the  linear  oscillator, 
cf.eqs.(5)  with  >//  =  2ir<p. 


654 


Case  II: 


Here  the  variables  C  and  k  are  related.  Since  the  modulus  k  is  a 
natural  elliptic  function  quantity,  one  could  formulate  eqs.(28)  in  terms  of  k 
and  <p‘- 


(32.1) 

(32.2) 


- 

dt  “ 

d<f)  _ 
dt  “ 


a  dC 

4  K  (1  -  2k^) 


(1  -  2k^)^ 


cn 


+  £  g 


y/p  (  1-  2k^) 
4  K  a  k 


-  [z  cn' 

(1  -  k^)  L 


+  k 


cn  (1  -  cn 


2 


Note  that  this  formulation  breaks  down  for  cases  I  and  III. 


Case  III: 


Eqs.(28)  simplify  to 


(33.1) 


dC 


1  1 


dt  V  C  ®  “ 


(33.2) 


d<p 


g 


I  1 
vf  4  K  C^ 


cn 


We  will  formulate  the  averaging  procedure  in  terms  of  (C,<f)  and  for  case  II,  we 
will  use  k  and  C  interchangeably  (via  eq.(32.1)). 
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The  Averaiging  Procedure 


While  eqs.(28)  are  valid  for  any  perturbation  g,  in  this  section  we 
consider  perturbations  of  the  form  g  =  g{x.x').  where  g  is  a  polynomial  in  x 
and  x‘ .  We  write  eqs.(28)  in  the  form 

(34.1)  C  =  e  Fj(C..#.) 

(34.2)  (a  +  P  +  £  F2(C..f) 

=  n  (C)  +  £  F2(C.^) 

where  the  F^^.  as  given  by  eqs.(28),  are  periodic  in  <p. 

We  denote  the  averaged  variables  by  (C,«f).  Then,  the  averaged 
equations  become 

(35.1)  C’  =  £  Fj  +  0(£^) 

(35.2)  =  Q{C)  +  e  F^+  0{e^) 

where  F^  are  the  mean  values  of  F^  over  one  period  of  the  unperturbed  system 

(36)  F.  =  i 

where  u  =  4  R  <#>,  R  =  K(k) ,  k  =  k(C)  as  given  by  eqs.(16). 


‘  .  1 
F.  d<f?  =  - 

0  ^  4  K 


F.(C.u)  du 
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Computer  Aleebra  Implementation  Of  The  Av^rag-ing  Scheme 


We  present  a  short  summary  of  our  implementation  of  the  averaging 
scheme  on  the  computer  algebra  system  MACSYMA.  The  F>erturbation  g  is  composed 
of  a  sum  of  terms  of  the  form 

f  0^7  \  I  m  A  IT)  I  m 

(37)  XX  =  C  A  cn  cn 


each  of  which  can  be  written  as  a  sum  of  terms  of  the  form 

(38.1)  C"*"  a”  cn’  .  n,  odd 

(38.2)  A™  .  m  even 


using  eq.(27.1).  It  is  therefore  sufficient  to  consider  g  to  be  composed  of  a 
sum  of  terms  of  the  form  cn""  and  cn"*  cn' .  By  inspection  of  eqs.{28),  we  can 
nake  a  list  of  all  combinations  of  elliptic  functions  which  can  possibly  occur 
in  the  integrzmds  of  eqs.(36),  tuid  their  mean  values.  The  integrands  are 
listed  in  Table  3  auid  their  mean  values  in  Table  4. 


Table  3.  Terms  occurring  in 


Expression  Typical  terms 


m 

m 

cn  . 

cn 

cn 

m 

m 

m  m  , 

cn  , 

cn 

cn'  , 

Z  cn  ,  Z  cn  cn 
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Table  4.  Meein  values  of  elliptic  functions 


Function 


Mean  Value 


m 


cn 


m 

cn  cn 

_  m 
Z  cn 

Z  cn 


cn' 


D  for  m  even 
m 

0  for  m  odd 
0 
0 

0  for  m  even 


where 


m 


TtC(i 


-  |)  Vl  * 


m  odd 


1  + 


; - hrz  f'"  ■  ■  ■’  Va  Mn.  -  3)  (1  -  k^) 

(m  -  1)  k 


Armed  with  Table  4,  one  could  find  the  averaged  equations  for  a  given 
perturbation  g(x,x’)  by  hand.  This  lengthy  calcu.ation,  however,  is  much 
better  suited  to  MACSYMA.  The  MACSYMA  program  which  implements  the  foregoing 
averaging  procedure  is  listed  in  the  Appendix.  As  am  example  of  its  use,  we 
next  apply  the  method  to  the  problem  of  finding  limit  cycles  in  eq.(8).  We 
begin  by  returning  to  eq.(l),  then  we  generalize  the  example. 
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If  we  write  eq.(l)  in  the  form 


•  •  ^  ^1  3 

X  +X  +  jQX=fc 


f  1  .  ^  31  2  .  .31 

1^  2  X  +  10  X  X  X  J . 


with  e  = 


1  1  31  2  3 

we  identify  a  =  1,  /3  =  g  =  -  ^  x'  -  x  x’  +  x'  .  Substitution  of  these 
values  into  eqs.(28)  and  averaging  gives  (see  sample  run  of  our  MACSYMA  program 
in  the  Appendix) : 


C  =  -e 


P(C)  K  -  Q(C)  E 
350  C  K 


f  = 


1  +  C^/10)^''^ 


where  P(C)  =  5  C®  +  447  +  10175  ^  + 


64700 


Q(C)  =  594  +  11880  +  64700 


k2  _ 

2  ^  +  20 


and  where  K  =  K(k)  and  E  =  E(k)  represent  complete  elliptic  integrals  of  the 
first  and  second  kinds  respectively.  Numerical  evaluation  of  the  condition 
C’  =  0  gives  the  limit  cycle  amplitude  C  =  1.9861.  Then  eq.{22.1)  gives  the 
following  approximation  for  the  limit  cycle: 


X  =  1.9861  cn(1.1808  t.  k=0. 37608) 


This  approximation  offers  reasonable  agreement  with  numerical  integration  of 
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eq.(l)  for  e  =  1/10,  see  Fig.l.  Note  that  first  order  avertiging  off  of  the 
simple  harmonic  oscillator  failed  to  predict  a  limit  cycle  for  this  equation, 
cf.  eq.(7.1). 

Example:  Limit  Cycles  in  Eg. (81 


We  investigate  limit  cycle  solutions  in  systems  of  the  form 


(41) 


+  a  X  + 


^  x^  + 


e  g  =  0 


in  which 


g  = 


fix'  +  2  u .  .  x^  x‘  •* , 
ij 


where  2  i  i+j  <  4 


Using  eq.(27.1),  eq.(28.1),  and  Table  4,  we  find  that  the  only  terms  that  make 
nonzero  contributions  to  C  are 


(42) 


6  x'  , 


03 


X 


.3 


The  condition  for  a  limit  cycle  is  that  C  be  zero,  i.e.,  =  0.  This 

condition  on  the  parameters  6,  1^21*  **q3  will  then  determine  the  limit  cycle 

radius  (if  a  limit  cycle  exists).  The  other  ten  terms  in  g  do  not  influence 
the  existence  of  a  limit  cycle  (to  0(e)).  Therefore,  we  take  a  modified 
perturbation  for  g: 


(43) 


g  =  6  x' 


P  X 


Note  also  that  6  =  p  =  tj  =  0  implies  the  existence  of  a  family  of  closed 
orbits,  and  not  limit  cycles. 
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Eq.(45)  can  be  viewed  as  a  relationship  between  the  limit  cycle  amplitude  C  and 
the  parameters  6,  p,  a:^d  -q.  When  any  two  of  these  partuneters  are  zero,  there 
can  be  no  limit  cycle.  Eq.(45)  is  singular  when  the  product  q  p  vauiishes, 
i.e.,  the  normally  quadratic  equation  in  becomes  linear.  When  q  P  =  0. 
eq.(45)  reduces  to: 


(46.1) 


C2  =  - 


6  V, 


p  V2  +  T}  a  V3 


.  q  P  =  0 


Eq.(46.1)  can  have  at  most  one  positive  root,  etnd  hence  there  can  be  at  most 
one  limit  cycle.  For  q  P  ^  0,  we  solve  eq.{45)  to  get 


o  ~  a  q  V~  ~  p  Vn  i 
(46.2)  cr  = - = - - - 


(a  T?  V3  +  p  )  -  4  ^  6  77  Vj  V3 


2  p  q  V, 


q  P  ^  0 


For  cases  I  and  III,  C  does  not  depend  on  k.  Eqs.(46)  are  then  explicit 
relations  between  the  jjarameters  and  eimplitude  C  for  the  existence  of  a  limit 
cycle.  For  case  II,  however,  C  does  depend  on  k  so  that  eqs.{46)  only 
implicitly  define  C  (since  depends  on  k).  Investigating  eq.(46.2) 
numerically,  we  find  can  be  made  to  have  zero,  one,  or  two  positive  roots 
for  real  C.  A  bifurcation  occurs  along  the  curve  that  is  the  intersection  of 
eq.(45)  with 


(47) 


^  {5  +  P  ^  ^  ^  P  _  Q 
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(44.1) 


We  find  Fj  and  Fj  to  be  (cf.  eqs.  (34)  .  (36)  )  •' 

Fj  =  -  C  [6  cn‘^  +  p  cn^  cn‘^  +  rj  (a  +  p  C^)  cn’^] 

(44.2)  Fj  =  -  C  [6  Vj  +  p  ^  V2  +  T}  (o  +  /3  C^)  V3] 

where 

2 

Vj  =  mean  of  cn' 

= - ^  [K  (k^  -  1)  +  E  (1  -  2k^)] 

3  K 

2  2 

V2  =  mean  of  cn  cn' 

= - -  [K  (k^  -  3k^  +  2)  -  2  E  (k^  -  k^  +  1)] 

15  k^  K 

V3  =  mean  of  cn'"^ 

=  - ^ - [K  (8k®  -  13k^  +  3k^  +  2)  -  2  E  (8k®  -12k'^  +  2k^  +  1)] 

35  k^  K 

We  drop  the  bar  notation  for  convenience  here  8ind  in  what  follows.  The  value 
of  k  is  related  to  C  by  eqs. (16).  The  Vj  turn  out  to  be  positive  for  valid 
values  of  k.  Ignoring  the  trivial  case  C  =  0,  a  limit  cycle  exists  (F^  becomes 
zero)  for: 

(45)  6  Vj  +  p  V2  +  Tj  (a  +  p  C^)  V3  =  0 
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Limit  cycles  on  this  curve  are  degenerate.  As  one  moves  across  this 
bifurcation  curve,  two  limit  cycles  coalesce  at  a  finite  non-zero  radius.  We 
continue  the  discussion  by  considering  the  limiting  cases  I  and  III: 

Case  I:  Results  for  the  linear  oscillator 


The  values  of  become  indeterminate  at  k  =  0.  By  taking  limits  we 
find  (cf.  eq.('46.1)): 


(48.1) 


(48.2) 


p  +  3  a  T) 


This  agrees  with  the  solution  found  in  [14]  by  perturbing  off  of  the  linear 
oscillator. 


Case  III:  Results  for  the  Purely  Nonlinear  Oscillator 


We  evaluate  and  to  be  (cf.  eqs.(46)): 

(49.1)  =  I  •  ^2  =  *09139...  .  V3  =  y 

(49.2)  c2  =  -  i  =  0 

,  -T-IspV-iJt  J  21  V„^  -  4  P  S  t) 

(49.3)  cr  =  - ^ - s -  .  r,  Jt  0 

2  J3  p  T7 
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We  continue  the  discussion  of  this  problem  by  considering  the  number  of  limit 
cycles  which  occur  for  given  values  of  the  paraireters.  i.e.,  the  bifurcation 
set. 

The  Bifurcation  Set 


The  cases  tj  =  0  and  tj  ^  0  are  considered  separately.  The  latter  case 
is  then  divided  into  the  two  cases  j3  =  0  and  /3  ^  0. 

i 

Case  n  =  0 


From  eq.(46.1),  we  expect  at  most  one  limit  cycle  with  amplitude  C  satisfying 


(50) 


V 

c2-2 

^1 


= 


where  =  (—  — )  is  a  parameter 


We  now  compare  limit  cycle  bifurcation  curves  for  cases  I,  II,  and  III, 
Eq.(50)  reduces  in  these  instances  to  (cf.  eqs.(48.1).  (16.2),  (49.1)): 


(51.1) 


Case  I :  =  u 

4  '^o 


(51.2) 


a  2k2  VgCk) 

Case  II:  (_5 - -)  (^)  ^ 


1  -  2k 


2'  ^  P  *  Vj(k)  "  "^o 


(51.3) 


Case  III:  0.27417 


...  C^  = 
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Fig. 3.  Limit  cycle  amplitude  C  for  eqs.(41)  and  (43)  with  r\  =  0.  C,  shown  as 

the  abscissa,  is  determined  by  the  intersection  of  a  particular  V2/V1  curve 

(which  depends  on  ct//3)  with  the  straight  line  Pq  =  -6/p.  The  arrow  shows  the 

increase  in  limit  cycle  amplitude  C  resulting  from  increasing  a/p  while  holding 
u  f ixed . 


A  graph  of  eqs.(51)  appears  in  Fig. 3.  Note  that  cases  I  and  III  provide  bounds 
for  case  II  auid  that  is  rather  insensitive  to  the  parameter  ratio  o/p. 
Numerical  experiments  support  these  analytical  predictions. 

Case  n  /  0 

From  eq.(45).  we  find 


(52) 


=  ~  +  {«  +  P  C^)  V3] 


where  ^  and  p„  =  (—  — )  are  parameters 

IT]  ^  T} 


Eq.(52)  defines  a  family  of  straight  lines  in  the  (p^.Pg)  parameter  plane  with 
slopes  and  intercepts  jjarameterized  by  a.  p,  euid  C.  Both  the  sloi>e  and  the 
P2-intercept  have  the  value  zero  at  C  =  0,  ztnd  increase  as  C  increases. 

Case  V  0.  6  =  0  (Case  I) 


Eq.(52)  becomes  (cf.  eqs. ( 16. 1) , (48. 1)) : 

(53)  P2  =  :|-c2  [3  a  +  Pj] 

with  Pj-intercept  at  point  P  (p^  =  -3  a.  P2  =  0)  for  all  values  of  C.  A  graph 
of  eq.(53)  parameterized  by  C  is  given  in  Fig. 4.  There  is  one  limit  cycle  in 
regions  I  and  II;  there  are  none  in  regions  III  and  IV,  The  P2  =  0  line  is  a 
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Fig. 4.  Limit  cycles  in  eq.(41)  for  /3  =  0.  The  parameters  and  are 

defined  by  =  p/ry  and  P2  =  -5/t?.  cf.  eq.(53).  Along  each  straight  line  there 

exists  a  limit  cycle  of  fixed  amplitude.  Thus,  in  regions  I  and  II  there 

exists  1  limit  cycle  while  in  regions  III  and  IV  there  are  no  limit  cycles. 

The  Pj  axis  corresponds  to  the  limiting  case  of  a  limit  cycle  of  zero  amplitude 

(and,  hence,  a  Hopf  bifurcation  occurs  as  one  crosses  the  Pj^  axis.).  The 

dashed  line  is  p^  =  -  3  a  and  corresponds  to  limit  cycles  of  infinite 

amplitude.  The  arrow  shows  the  direction  of  increasing  limit  cycle  amplitude. 


Hopf  bifurcation  curve  where  a  limit  cycle  is  born  at  the  origin.  On  the 
bifurcation  line  =  -  3  a,  a  limit  cycle  of  infinite  amplitude  is  predicted. 
Point  P  is  a  highly  singular  point:  near  P,  the  limit  cycle  eunplitude  is  very 
sensitive  to  small  changes  in  and  P2. 

Case  T>  ^  0.  6^0  (Cases  II  and  IIIl 

The  Pj^  intercept  moves  out  from  Pj  =  -3aatC  =  0  towards  infinity  as 

C  -»  ®.  With  this  information,  we  plot  eq.(52).  parameterized  by  C,  in  the 

{Pl,P2)  plane  (see  Fig. 5).  One  limit  cycle  exists  in  region  I,  two  in  region 

II  (where  each  point  lies  on  exactly  two  intersecting  lines),  and  none  in 

region  III.  A  degenerate  limit  cycle  exists  on  the  bifurcation  curve  between 

II  and  III.  The  p^  axis  is  a  Hopf  bifurcation  curve  where  a  limit  cycle  is 

born  at  the  origin.  Point  P  (pj^  =  -  3  a,  P2  =  0)  is  again  a  singular  F>oint 

where  a  degenerate  limit  cycle  of  zero  amplitude  exists.  Near  P,  the 

sensitivity  of  the  amplitude  on  p^  euid  P2  depends  on  the  smallness  of  j3. 

The  predictions  of  Fig. 5  are  in  agreement  with  the  results  of  numerical 

integration  of  the  original  differential  equation  (41). 

A  comparison  of  the  linear  analysis  O  =  0,  Fig. 4)  with  the  nonlinear 

analysis  (P  >£  0,  Fig. 5)  shows  qualitatively  different  results.  In  both 

3 

analyses,  a  perturbation  term  of  the  form  e  u^q  x  does  not  contribute  to 
determining  the  existence  of  a  limit  cycle.  Yet  for  P  small,  the  nonlinear 
analysis  does  not  reduce  to  the  linear  one.  The  linear  auialysis  fails  to 
predict  one  limit  cycle  in  region  IV  of  Fig. 4  and  two  limit  cycles  in  part  of 
region  II  for  P  small. 
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Fig. 5.  Limit  cycles  in  eq.(41)  for  ^  0.  The  parameters  eind  ^2  sire 

defined  by  =  p/ri  and  =  ~6/tj,  cf.  eq.(52).  Along  each  straight  line  there 

exists  a  limit  cycle  of  fixed  amplitude.  Thus,  in  region  I  there  exists  1 
limit  cycle;  in  region  II  there  exists  2  limit  cycles;  and  in  region  III  there 
are  no  limit  cycles.  The  axis  corresponds  to  the  limiting  case  of  a  limit 

cycle  of  zero  amplitude  (and,  hence,  a  Hopf  bifurcation  occurs  as  one  crosses 
the  axis.).  Along  the  curve  separating  region  II  from  III  two  limit  cycles 

coalesce.  The  arrow  shows  the  direction  of  Increasing  limit  cycle  amplitude. 


Numerical  simulations  confirm  the  nonlinear  analysis.  Eq.(l)  provided 
an  example  with  the  following  parameter  values : 

(54)  a  =  1.  p  =  e  =  0.1,  6  =  -0.5,  p  =  -3.1,  tj  =  1 

in  which  the  system  belongs  to  region  IV  of  Fig. 4  and  I  of  Fig. 5.  As  we  saw 
before,  the  analysis  based  on  elliptic  functions  agreed  with  numerical 
integration,  while  the  usual  trigonometric  approach  failed  to  predict  a  limit 
cycle. 

Another  example  is  afforded  by  the  parameter  values: 

(55)  0=1,  p  =  2  e  =  0.1,  6  =  1,  p  =  -4.6,  t)  =  1 

in  which  the  system  belongs  to  region  II  of  Figs. 4  amd  5.  A  numerical 
simulation  finds  two  limit  cycles  with  amplitudes  1.93  and  2.93.  Using 
eq.(52),  the  predicted  values  are  1.97  and  2.59,  which  compare  well  with  the 
numerical  integration  values.  The  linear  prediction  eq.(48.2)  predicts  only 
one  limit  cycle  with  amplitude  1.58. 

Conclusions 

With  the  advent  of  computer  algebra,  perturbation  amalyses  using 
elliptic  functions  can  now  be  done  almost  as  easily  as  those  using  trignometric 
functions.  We  have  shown  that  jserturbing  off  of  elliptic  functions  will 
generally  provide  better  quantitative  and  in  some  cases  better  qualitative 
results  than  a  comparable  perturbation  off  of  trigonometric  functions.  In  some 
problems,  averaging  off  of  elliptic  functions  (which  contain  an 
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amplitude- frequency  dependence  that  trignometric  functions  lack)  provides 
results  at  first  order  which  can  only  be  attained  by  averaging  off  of 
trigonometric  functions  to  second  order.  In  the  case  of  limit  cycles  in 
eq.{41),  first  order  trigonometric  averaging  gives  qualitatively  incorrect 
predictions  if  P  /  0  and  <  -3a,  cf.  Figs. 4, 5. 

Related  work  in  progress  by  the  authors  includes  the  extension  of  the 

2 

aversiging  method  off  of  elliptic  functions  to  include  terms  of  0(e  ).  This 
involves  computing  a  near-identity  transformation  auid  is  a  generalization  of 
second  order  averaging  off  of  trigonometric  functions  (see  [14].)  Additional 
applications  of  the  MACSYMA  program  have  been  made  to  the  forced  Duffing 
equation  and  to  systems  of  the  form  of  eq.(9)  in  which  a  and  p  are  slowly 
varying  functions  of  time.  In  particular,  extensions  of  this  work  to  problems 
in  which  a  and  P  are  not  necessarily  positive  are  in  progress. 
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Appendix.  MACSYMA  Computer  Program  Listing 

/»*  ROUTINE  TO  PERTURB  OFT  X*  ’  +  AL  X  ^  3E  X^3  +  E  G(X.X’)  =  0  »♦/ 

AVERAGE(  ) :  =BLOCK(  [X .  Y .  XX .  YY .  EC .  KC .  AL .  BE .  G.  F .  FX2 .  FZ2 .  FI .  FEAR .  HI .  D .  CFLOW .  PFLOW] . 
PRINT( "AVERAGING  OF  X’  *  +  AL  X  +  BE  X^3  +  EPS  G(X.X’  .EPS»«T"). 

PRINT("  ").AL:READ( "ENTER  AL:"). 

PRINT("  ").BE:READ("ENTER  BE:"). 

PRINT("  ").PRINT( "ENTER  G(X.X’)  USING  Y=X*:"). 

G:READ(). 

PRINT{"  ").PRINT("THE  SOLUTION  TO  THE  UNPERTURBED  SYSTEM  IS"). 

PRINT("X  =  C  CN(4»»KC(C)»*PHI.K)"). 

PRINT{"X‘  =  C  SQRT(AL  +  BE  (r2>  CN*(^C(C)hPHI.K)")  . 

PRINT("WHERE  0  <=  K^2  =  BE  (r2/2/{AL  +BE  C^2)  <=  1/2"). 

PRINT("KC  =  COMPLETE  ELLIPTIC  INTEGRAL  OF  FIRST  KIND"). 

PRINT("AND  4»«KC(K)»«PHI  =  SQRT(AL  +  BE  Cr2)»^+B")  .PRINT{"  "). 

PRINT("SEEK  PERTURBED  SOLUTION  OF  SAME  FORM  WHERE  (C.PHI)"). 

PRINT( "BECOME  FUNCTIONS  OF  TIME"). 

PRINT{"  "). 

/*«  X  =  C  CN(4i«KOPHI)  ♦♦/ 

/»♦  Y  =  X’  =  C  SC5RT(AL  +  BE  (r2)  CN'(4**KOPHI)  »/ 

/**  SYMBOLS  ♦♦/ 

/*♦  XX  =  CN  FUNCTION  »♦/ 

/*«  YY  =  CN’  FUNCTION  (DERIVATIVE  OF  CN  W.R.T.  ARGUMENT)  ♦♦/ 

/»♦  ZZ  =  ZETA  FUNCTION  h/ 

/**  KC.EC  =  COMPLETE  ELLIPTIC  INTEGRALS  OF  1ST. 2ND  KINDS  **/ 

/i*  K  =  MODULUS  *«/ 

KILL(K). 

/**  FOR  SPECIAL  CASES.  K  IS  A  NUMBER  »/ 

IF  AL  =  0  THEN  K:SQRT(l/2). 

IF  BE  =  0  THEN  (K:O.KC:EC:%PI/2) . 

/»♦  REDUC  ROUTINE  TO  REDUCE  EXPRESSIONS  TO  FORMS:  CN^M  AND  CN^M  CNP  **/ 

REDUC(EXPR) : =BLOCK( [EVEN . ODD . VAL] . 

EVEN:EXPAND({EXPR+EV(EXPR.YY=-YY))/2) . 

ODD: EXPAND{ {EXPR-EVEN)/YY) . 

ODD:  YY»«EXPAND(EV(ODD.  YY=SQRT( ( 1-XX^2)»«(  1-K^2+K^2»«XX''2)  )  )  )  . 

EVEN :  EXPAND(EV(EVEN ,  YY=SQRT(  ( 1-XX^2)»«(  1-K''2+K^2»«XX''2)  )  )  )  . 
VAL:EVEN-K)DD 
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/»«  AVERAGING  PROCEDURE  *«/ 


G :  EV(G .  X=OXX .  Y=OSQRT(  AL+BE»«Cr2)»«YY) . 

F[  1  ] :  -1/SQRT(  AL+BE«(r2)»*REDUC(G»tYY) . 

F[2] :  l/C/4>^C/SQRT(AL+BEKr2) 

♦tREDUC(G»t(XX-(  l-2»tK"2)/{  1-K^2)h(ZZ»*YY+K^2hXX»*{  l-XX'^2)  )  )  )  . 

IF  K  =  0  THEN  F[2]:EV(F[2].ZZ=0). 

F[1]:EV(F[1].YY=0).  /»♦  CN^M  CNP  TERMS  HAVE  NO  MEAN  h/ 

FZ2:RATa)EF(FC2].ZZ).  /*♦  PICK  OFF  Z  TERMS  IN  F[2]  »/ 

FX2:EXPAND(F[2]-FZ2»«ZZ).  M  PICK  OFF  X  TERMS  IN  F[2]  »«/ 

FZ2:EXPAND(EV(FZ2-EV(FZ2.YY=0).YY=1)).  /»  Z  CN^M  TERMS  HAVE  NO  MEAN  »/ 

FX2:EV(FX2.YY=0).  /»*  CN^M  CNP  TERMS  HAVE  NO  MEAN 

/»*  MEAN  VALUE  ROUTINE  »«/ 

D[0]:l. 

D[1]:0. 

D[2] :  l/KBAR"2i«(EC/KC-l+KBAR^2) . 

D[3]:0. 

D[  1 1  ] :  =RATSIMP(  l/(  I  I-l  )/KBAR^2»«(  ( I  I-2)»«(2i<KBAR^2- 1  )i<D[  1 1-2] 
+(II-3)»«(l-KBAR^2)»«D[II-4])) . 

IF  K  =  0  THEN  (D[2]:l/2.DCII]:=RATSIMP((II-l)/II»*D[;iI-2])). 

IF  K  =  SQRT(l/2)  THEN  KBAR:SQRT(l/2) . 

/»*  FIND  MEAN  USING  TABLE  4  ♦*/ 

HI:MAX(HIP0W(F[1] .XX) .HIP0W(FX2.XX) .HIPOW(FZ2,XX)) . 

FOR  II: 1  thru  2  DO  FBAR[II]:0. 

FOR  II :0  THRU  HI  DO  ( 

FBAR[  1  ] :  FBAR[  1  ]+RATOOEF(F[  1  ] .  XX.  1 1  )»«D[  1 1  ] . 

FBAR[2] ;  FBAR[2]+RATa)EF(FX2.XX.  II)»*D[II] 

-RATOOEF(FZ2 . XX.II)/(II+1)»((1 -KBAR^2-EC/KC)»»D[ I 1+ 1 ] 

+KBAR^2»«D[II+3]) 

). 

CHANGE  RESULTS  TO  PRINTABLE  FORM  */ 

FOR  II :l  THRU  2  DO  FBAR[II] :EV(FBAR[II] .ABS(C)=CBAR.C=CBAR.K=KBAR) . 

/*«  PRINT  AVERAGED  EQS  »/ 

CFLOW :  EPS»*FACrOR( FBAR[  1  ]  )  . 

PFLOW:  l/4>^OEV(SQRT(AL+BE«CBAR^2)  .ABS{CBAR)=CBAR)+EPS»«FACrOR(FBAR[2]) . 
DERIVABBREV;TRUE.KILL(KBAR) . 

VAL: [DIFF(CBAR(T) .T)=CFLOW.DIFF(PHIBAR(T) .T)=PFLOW. 
KBAR^2=BE»«CBAR"2/2/(AL+BE>«CBAR"2)  ] . 

PRINT("THE  AVERAGED  EQUATIONS  ARE") . PRINT("  "). 

PRINT(VAL).PRINT("  ") 
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Here  is  a  sample  run  based  on  the  example  discussed  in  the  text,  eq.(l): 

(c6)  AVERAGE()$ 

PERTURBATION  OF  X”  +  AL  X  +  BE  +  EPS  G(X.X’  .EPS*^)  =  0  BY  AVERAGING 

ENTER  AL: 

1: 

ENTER  BE: 

1/10; 

ENTER  G(X.X*)  USING  Y=X’: 

-Y/2-31»«X"2»«Y/10+Y'‘3 : 

THE  SOLUTION  TO  THE  UNPERTURBED  SYSTEM  IS 

X  =  C  CN(4»«KC(C)»«PHI.K)  .  X*  =  C  SQRT(AL  +  BE  Cr2)  CN* (4»*KC{C)*tPHI .K) 

WHERE  0  <=  K^2  =  BE  Cr2/2/(AL  +BE  Cr2)  <=  1/2 
KC  =  COMPLETE  ELLIPTIC  INTEGRAL  OF  FIRST  KIND 
AND  4»«KC(K)»«PHI  =  SQRT(AL  +  BE  Cr2)»^+B 

SEEK  PERTURBED  SOLUTION  OF  SAME  FORM  WHERE  (C.PHI)  BECOME  FUNCTIONS  OF  TIME 
THE  AVERAGED  EQUATIONS  ARE 

4  6  2  6 

[cbar(t)  =  -  char  eps  (24  char  kbar  kc  +  240  char  kbar  kc 
t 

4  4  2  4  4  4  2 

-  39  cbar  kbar  kc  -  173  char  kbar  kc  +  175  kbar  kc  +  9  cbar  kbar  kc 

2  2  2  4  2 

-  561  cbar  kbar  kc  -  175  kbar  kc  +  6  cbar  kc  +  494  cbar  kc 

4  6  2  6  4  4 

-  48  cbar  ec  kbar  -  480  cbar  ec  kbar  +  72  cbar  ec  kbar 

2  4  4  4  2  2  2 

+  286  cbar  ec  kbar  -  350  ec  kbar  -  12  cbar  ec  kbar  +  314  cbar  ec  kbar 

2  4  2  4 

+  175  ec  kbar  -  6  cbar  ec  -  494  cbar  ec)/(1050  kbar  kc), 

2 

cbar 

sqrt{ - +1)  2 

10  2  cbar 

phibar(t)  = - ,  kbar  = - ] 

t  4  kc  2 

cbar 

20  ( - +  1) 

10 

(VAX  8530  Time  =  157  sec . ) 
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The  results  of  the  program  give  the  averaged  equations  in  terms  of  both  C 
(called  cbar)  and  k  (called  kbar).  The  results  are  stored  in  the  variable  VAL: 
VAL[1]  contains  the  C'  equation.  VAL[2]  contains  the  «#>'  equation  and  VAL[3] 
contains  the  expression  for  k^  in  terms  of  C^.  The  followii^  commEUid 
substitutes  k  in  terms  of  C.  giving  eq.(39)  of  the  text: 

(c7)  FACrOR(EV(VAL[ 1 ] .KBAR=SQRT(RHS(VAL[3] ) ) ) ) ; 

6  4  2 

(d7)  cbar(t)  =  -  eps  (5  cbar  kc  +  447  cbar  kc  +  10175  cbar  kc  +  64700  kc 
t 

4  2 

-  594  cbar  ec  -  11880  cbar  ec  -  64700  ec)/(350  cbar  kc) 
(VAX  8530  Time  =  3  sec.) 
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The  Effective  Use  of  Computer  Algebra  S3'stems 
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Denver,  Colorado  8020S 


Abstract 

In  this  paper  we  give  an  outline  for  an  applied  computer  algebra  course 
which  describes  the  technical  skills  needed  to  effectively  use  a  computer 
algebra  system  to  solve  symbolic  mathematical  problems  in  science  and 
engineering. 


1  Introduction. 

A  Computer  Algebra  System  (CAS)  is  a  powerful  computer  program  which  is 
able  to  manipulate  and  analyze  symbolic  mathematical  expressions.  Computer 
algebra  systeins  are  quite  easy  to  use  and  it  is  easy  for  both  students  and 
professionals  to  learn  to  use  the  systems  in  a  superficial  way.  For  example,  to 
compute  an  indefinite  integral  or  find  the  closed  form  solution  to  an  elementary 
differential  equation,  one  needs  to  master  only  a  few  simple  operations.  However, 
to  really  understand  the  potential  uses  (and  limitations)  of  these  systems,  a 
mathematical  scientist  ‘  must  have: 

•  An  understanding  of  the  kind  of  mathematical  knowledge  contained  in  a 
CAS  and  an  understanding  of  the  extent  arid  reliability  of  this  knowledge. 

•  Some  knowledge  of  computer  algebra  programming  techniques  including 
recursion  and  list  manipulation. 

•  The  ability  to  formulate  a  symbolic  mathematical  problem  in  an  algorith¬ 
mic  way  and  the  ability  to  express  the  algorithm  in  terms  of  the  math¬ 
ematical  operations  and  programming  structures  available  in  a  computer 
algebra  language. 

'We  use  the  term  mathematical  sciential  as  a  generic  lerm  to  represent  niatheitiaticians, 
computer  scientists,  physical  scientists,  engineers,  statisticians,  economists  and  others  wlio 
use  mathematical  reasoning  in  their  work. 


677 


•  Some  understanding  about  how  a  CAS  works. 

•  An  exposure  to  a  collection  of  examples  that  illustrate  the  successf\d  use 
of  a  CAS. 

•  A  feeling  for  which  symbolic  calculations  are  best  done  by  hand  and  wliicli 
are  best  done  by  a  CAS. 

In  this  paper,  we  shall  outline  a  course  in  applied  computer  algebra.  The  course 
outline  describes  the  technical  skills  needed  to  effectively  use  a  CAS  to  solve 
symbolic  mathematical  problems  in  science  and  engineering. 


2  A  Frustrating  Example. 

Let  us  begin  by  taking  a  critical  look  at  an  example  that  illustrates  the  use  of  a 
CAS.  The  problem  we  have  chosen  is  from  modern  physics.  It  emphasizes  many 
of  the  issues  faced  by  the  mathematical  scientist  who  wishes  to  use  a  CAS  to 
help  solve  a  problem. 

Consider  the  time  dependent  Schroedinger  equation  for  the  one  electron 
atom 

,  d'i’T  ,  ft'  ,d-tl>r  ,  , 

“  o - i  o  T-  ^  O  j  “  - - (  ^  ~  y  t 

2vii  dxj  (9t/f  dzf  2m->'  c/Xo  dzh 

(1) 

where  i/’j  =  ,  xo,  i/i ,  j/2.  ,  22)-  ff  we  make  the  following  change  of  vari¬ 

ables 

miXi  -f  rnox-) 

I  =  - 2—2. 

m\  -f-  m2 

+  m2l/2 

y  =  - - - 

mi  -(-  m2 
miZi -{■  m‘2Z2 

m  1  -f  m2  ^  ^ 

’■  =  \/(X2  -  Xi)'  +  (1/2  -  J/l)'  +  (22  -  2i  )' 


,  ,  ,  yz  —  yi  . 

<p  =  arctanf - ) 

Xl  —  X2 
Zo  —  Zi 

0  =  arccos(— - ), 

r 

equation  (1)  is  transformed  to  the  form 


ft'  ,d-4>T  d-j’T  d-j’T,  f  ^  ^  (  2  ^V'7' 
2  (mi  +  m2)  dx-  dy‘  dz~  2(i  t  r-  Or  ^  Or 


1  d~ipT 


r-sin'(t>)  50-  r-sintl 


+  V'«T  = 
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where 


M) 

mi  +  m2 

The  transformation  appears  in  the  text  Eisberg([5],  pp.  295-297)  wliere  the  au¬ 
thor  remarks,  “This  is  actually  a  quite  tedious  task,  and  so  we  present  here  only 
the  results.”  This  example  is  similar  to  the  polar  and  spherical  transformations 
of  the  Laplacian  and  other  vector  analysis  operators  which  appear  in  texts  on 
advanced  engineering  mathematics.  This  problem  is  more  involveil  since  there 
are  six  independent  variables. 

A  number  of  years  ago  the  author  attempted  to  verify  tliis  transformalioii 
using  a  CAS.  The  point  of  the  exercise  was  to  illustrate  to  a  class  of  mathe¬ 
matics,  engineering  and  |>hysical  science  students  the  power  of  a  CAS  to  handle 
tedious  but  routine  calculations.  This  problem  is  typical  of  the  type  of  symbolic 
calculations  a  scientist  or  engineer  might  encounter  in  his  work.  Let  us  suppose 
this  person  is  lucky  enough  to  have  a  powerful  work  station  with  computer  alge¬ 
bra  software  in  the  office  and  has  had  some  experience  with  a  CAS.  Let  us  also 
suppose  that  he  needs  to  verify  this  transformation  and  must  make  a  decision 
whether  to  do  this  with  pencil  and  paper  or  with  a  CAS.  A  number  of  questions 
of  a  philosophical  nature  come  to  mind: 

•  Can  Ifits  calculation  be  done  by  a  CAS?  Since  this  calculation  is  straight¬ 
forward,  the  answer  is  presumably  “yes.” 

•  Should  this  calculation  be  done  by  a  CAS  or  with  paper  and  pencil?  d'herc 
are  number  of  important  considerations  here.  First,  there  is  the  question 
of  time  and  effort.  Will  using  a  CAS  require  less  time  than  hand  calcula¬ 
tion?  Next,  there  is  a  question  of  accuracy.  This  is  particularly  imi>ortant 
if  the  final  answer  to  the  problem  is  unknown.  Presumably,  if  the  problem 
is  accurately  entered  into  the  computer,  if  the  user  has  chosen  the  appro¬ 
priate  commands,  and  if  the  CAS  is  free  of  bugs,  the  CAS  should  produce 
an  accurate  result.  Finally,  will  important  side  effects  of  a  pencil  and 
paper  calculation  be  lost?  In  the  course  of  a  derivation  other  important 
relationships  often  appear  which  would  not  be  apparent  if  the  calculation 
were  done  with  a  CAS. 

•  Can  this  problem  be  done  by  a  novice  symbolic  programmer  or  is  expert 
knowledge  of  a  CAS  requtied?  Presumably,  our  hypothetical  mathematical 
scientist  is  closer  to  the  novice  than  the  expert  and  is  more  interested  in 
obtaining  insight  into  the  scientific  problem  than  learning  about  computer 
algebra. 

•  Is  this  calculation  a  direct  application  of  the  mathematical  knowledge  in  a 
CAS  or  does  it  require  a  symbolic  program  which  might  include  conditional 
statements,  loops  and  subroutines?  Ideally,  one  would  like  to  have  a  single 
command  which  takes  (1)  and  (2)  as  input  data,  and  returns  (3)  as  a 
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result.  If  this  command  is  not  available,  we  must  coml^ine  a  number  of 
commands  into  a  symbolic  program. 

•  What  algorithm  should  be  used  to  perforvi  the  calculation?  Can  we  nii- 
itate  the  steps  found  m  a  textbook?  A  starting  point  for  an  algoritliiii 
is  the  transformation  of  the  Laplacian  from  rectangular  to  polar  coor¬ 
dinates  found  in  many  advanced  engineering  text  books,  (for  example, 
see  Kreys2ig(8|,  pages  447-448).  However,  most  textbook  derivations  are 
written  to  facilitate  human  understanding.  They  are  not  written  witli  a 
computer  algebra  system  in  mind.  For  example,  the  Laplacian  derivation 
cited  above  contains  many  local  substitutions  which  make  it  understand¬ 
able  to  the  human  reader.  Is  it  necessary  to  follow  the  same  approach 
with  a  CAS  derivation? 

•  How  does  one  express  the  algorithm  tn  terms  of  the  operations  and  data 
structures  available  in  a  computer  algebra  language?  In  the  example  con¬ 
sidered  here,  the  primary  question  is  how  to  deal  with  the  undefined  func¬ 
tion  ipT-  Most  (but  not  all)  computer  algebra  systems  have  two  ways 
to  express  undefined  relationships  between  variables.  The  first  way  rep¬ 
resents  an  undefined  relationship  explicitly  with  an  expression  similar  to 
f{x,y).  The  other  way  declares  that  the  symbol  /  depends  on  the  sym¬ 
bols  X  and  y,  and  then  carries  out  all  calculations  with  derivatives  in  this 
environment.  Will  both  representations  produce  the  desired  result? 

•  Can  this  calculation  be  done  with  any  CAS?  Will  the  same  algorithm  work 
with  all  systems?  Perhaps  the  mathematical  scientist  is  wondering  which 
of  many  available  systems  should  be  used.  At  the  lime  this  problem 
was  tried,  the  author  had  these  four  systems  available:  MACSYM A[i;l], 
KEDUCE[6],  MAPLE[1],  and  muMath{lO). 

•  What  input  is  required  for  this  calculation?  Obviously,  one  has  to  input 
the  original  equation  (1)  and  the  transformation  (2).  However,  the  in¬ 
verse  transformation  which  expresses  the  variables  x\,x->,y\,y>,:x,~>  m 
terms  of  the  variables  x,y,  z,r,(j>,0  is  also  required.  Although,  computer 
algebra  systems  are  able  to  solve  some  systems  of  nonlinear  cCi'i-ations, 
no  CAS  is  able  to  invert  the  transformation  (2).  Therefore,  the  inversi.- 
transformation  must  also  be  input  into  the  system. 

•  Will  a  CAS  produce  the  intended  result  or  will  the  result  appear  in  a  mathe¬ 
matically  equivalent  form?  Although  a  CAS  can  perform  some  remarkable 
simplifications,  it  is  often  difficult  to  transform  an  expression  to  the  exact 
form  found  in  a  textbook. 

•  Is  trigonometric  simplification  possible  with  a  C/15? Theoretically,  it  is  im¬ 
possible  to  do  all  the  simplifications  we  may  wish  to  do  with  a  CAS.  In  this 
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case,  however,  the  problem  is  primarily  one  of  trigonometric  simplification 
which  involves  repeated  application  of  the  identity  sin"  n  +  cos"  u  =  1. 

•  How  much  CPU  time  will  the  calculation  require?  Obviously,  we  should 
only  perform  the  calculation  if  it  is  within  tlie  capabilities  of  onr  equip¬ 
ment. 

•  How  will  we  know  if  the  answer  is  correct?  Since  the  problem  is  one  of 
verification,  we  know  what  we  are  looking  for.  If  we  hail  not  known  the 
result,  we  could  apply  the  inverse  transformation  to  the  result  to  see  if  we 
could  obtain  the  original  equation.  In  general,  determining  if  a  result  is 
correct  poses  a  dilTicult  problem. 

In  retrospect,  the  transformation  of  (1)  to  (3)  is  not  a  particularly  ditlicnlt 
calculation  for  a  CAS.  However,  it  would  have  h'^en  diflicnll  to  convince  the 
author  of  this  when  he  was  trying  to  verify  the  result  with  a  CAS.  I'lie  ijneslion 
is  not  so  much  whether  someone  with  an  intimate  knowledge  of  some  CA.S  can 
obtain  the  result  by  pulling  a  few  commands  out  of  a  but.  llailier,  ilie  ipn  siion 
is  whether  a  person  with  moderate  knowledge  of  a  (JAS  can  obtain  the 
in  a  reasonable  amount  of  time. 

It  would  be  interesting  (in  fact  comical!)  to  review  the  path  (including  all 
the  false  starts)  taken  by  the  author  to  solve  this  problem  with  a  CAS.  We  shall 
not  go  into  all  the  technical  details  in  this  paper.  Rather,  we  shall  give  some 
overall  impressions  of  the  experience. 

•  A  preliminary  analysis  of  the  capabilities  of  the  four  systems  indicated 
that  the  MACSYMA  system  had  the  greatest  chance  of  success  with  this 
particular  problem.  Although  it  is  not  apparent  from  reading  the  man¬ 
uals,  the  other  three  systems  did  not  have  the  capability  to  apply  the 
differentiation  chain  rule  to  abstract  (undefined)  functions  ".  Since  this 
rule  is  needed  in  the  derivation,  these  systems  were  dropped  from  con.sid- 
eration.  Certaiidy,  this  limitation  may  be  eliminated  from  these  systems 
by  modifying  or  even  rewriting  the  differentiator 

•  The  problem  took  about  one  and  a  half  weeks  of  the  author’s  tune  and 
over  30  test  files  before  a  correct  program  was  obtained. 

•  Although  it  was  relatively  easy  to  write  a  program  to  transform  a  two  di¬ 
mensional  version  of  the  problem,  it  was  not  so  easy  to  modify  the  result 
for  three  dimensions.  In  fact,  the  obvious  generalization  of  the  two  dimen¬ 
sional  program  to  three  dimensions  originally  produced  a  residt  which  was 
nearly  correct  but  contained  a  few  superfluous  terms  which  did  not  sim¬ 
plify  to  zero.  Luckily,  in  this  case,  the  final  result  was  known.  Otherwise 
this  incorrect  residt  may  have  been  accepted  as  correct! 

^In  fact  none  of  the  systems  liaiulle  the  diffcreiitiation  and  integialit^ii  uf  undefiiifd  fnin  - 
tions  in  a  satisfactory  manner.  See  Coheiif?)  and  Wester[H]. 
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•  Originally,  we  Ignored  many  of  the  local  siibslitiitioiis  and  siinplificalions 
found  in  textbook  derivations  for  problems  of  this  sort  Instead,  we  relied 
on  a  brute  force  approach  which  require<l  an  unacceptable  amount  of  CPU 
time  (up  to  four  hours  on  a  VAX  750).  By  performing  some  local  substi¬ 
tutions  we  were  eventually  able  to  get  the  derivation  down  to  15  minutes 
of  CPU  time. 

•  To  effectively  use  a  CAS,  it  is  essential  to  thoroughly  understand  the  se¬ 
mantics  of  commands  in  tlie  system  .  Unfortunately,  a  precise  description 
of  a  command’s  function  rarely  appears  in  the  system  manual.  In  the 
course  of  trying  the  above  exantple,  significant  differences  were  observeil 
in  the  same  command  from  system  to  system.  A  program’s  behavior  in 
one  system  may  be  quite  different  from  its  behavior  in  another  system. 

3  A  Course  In  Applied  Computer  Algebra. 

We  believe  that  the  CAS  experience  described  above  is  not  unitine,  'I'liere  is 
more  to  using  a  CAS  than  reading  the  manual,  seeing  a  few  examples  and 
trying  a  few  commands.  Numerical  analysts  have  always  emphasized  that  using 
a  numerical  method  in  an  inappropriate  way  can  lead  to  disastrous  results.  We 
believe  the  same  is  true  for  symbolic  methods.  In  the  remainder  of  the  paper, 
we  describe  an  applied  computer  algebra  course  which  is  iiilendcil  tu  ndd/cs.s 
some  of  these  issues. 

'I'here  are  currently  many  CAS  courses  being  taught  in  the  US,  Canada 
ainl  Europe.  The  range  is  from  language  courses,  which  describe  how  to  us<-  a 
particular  (jAS  system  to  solve  a  variety  of  symbolic  mathematical  problems, 
to  more  advanced  courses  which  concentrate  on  the  mathematical  background 
needed  to  develop  efficient  algorithms  for  computer  algebra.  To  make  an  anal¬ 
ogy  with  numerical  computation,  the  former  courses  are  similar  to  scientific 
programming  courses  which  teach  the  mechanics  of  a  programming  language 
(FORTRAN  or  Pascal),  and,  in  some  cases,  the  use  of  numerical  or  statistical 
software  packages.  The  advanced  CAS  course  is  similar  to  a  course  in  numeri¬ 
cal  analysis  which  includes  a  theoretical  discussion  of  numerical  algorithms  in 
the  numerical  setting,  numerical  methods  courses,  which  lie  between  these  two 
extremes,  serve  to  introduce  a  mathematical  scientist  to  some  of  the  issues  and 
applications  of  numerical  computation  (for  example,  see  Conte  and  deBoor[;i] 
or  Dorn  [d]).  These  courses  are  taught  in  mathematics  de()artments,  computer 
science  de|}artments,  and  other  science  and  engineering  departments.  I’hi.-  goal 
is  to  introduce  the  issues  surrounding  numerical  computation  and  to  emphtisize 
its  advantages  and  pitfalls.  The  applied  computer  algebra  course  we  have  in 
mind  is  the  computer  algebra  analogue  to  the  numerical  methods  course. 

The  following  four  premises  underly  the  design  of  the  course: 

I.  T/te  course  ts  designed  for  mathematical  scientists  whose  primary  interests 
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are  not  in  computer  algebra  or  even  computer  science.  Tliis  population 
has  an  extremely  diverse  background  in  both  mathematics  ami  comput¬ 
ing.  A  safe  lower  bound  for  backgrounds  is  to  assume  the  usual  two  year 
freshman-sophomore  matliematics  sequence  (through  multivariable  calcu¬ 
lus,  linear  algebra  and  applied  diflerential  equations),  plus  some  experi¬ 
ence  with  numerical  programming.  Many  in  the  audience  will  not  have 
studied  discrete  mathematics,  abstract  algebra  or  programming  concepts 
such  as  recursion  and  list  manipulation.  'I'his  limits  the  set  of  examples 
that  can  be  used  to  demonstrate  the  capabilities  of  a  CAS.  It  also  means 
that  mathematical  concepts  from  these  areas  which  are  nee  led  to  write 
symbolic  programs  must  be  integrated  into  course  material. 

2.  The  course  is  oriented  toward  altjontlims  rather  than  a  paiticular  C'.AS 
language.  Although  the  notion  of  an  algorithm  for  a  symbolic  i;ompnlation 
is  implicit  in  traditional  mathematics,  it  is  not  usually  the  focus  of  the 
subject.  Techniques  for  solving  a  problem  are  usually  not  stati.-d  in  the 
formal  way  that  they  are  in  numerical  methods  or  computer  programming. 
For  example,  although  differeiUiatiou  is  formally  a  recursive  process,  it  is 
usually  described  in  an  informal  way  in  a  mathematics  textbook.  Rather 
than  emphasize  a  particular  CAS  language,  the  goal  should  be  to  develop 
the  skills  needed  to  create  symbolic  algorithms. 

Of  course,  it  is  important  to  include  some  programming  in  one  or  more 
CAS  languages.  However,  computer  algebra  systems  are  evolving  rapidly 
and  new  systems  are  being  developed  Rather  than  learn  all  the  de¬ 
tails  and  eccentricities  of  a  particular  language,  it  is  more  useful  to  learn 
the  general  principles  of  CAS  languages  and  an  approach  to  evaluate  the 
capabilities  of  a  particular  language. 

3.  Examples  of  CAS  applications  should  be  chosen  to  illustrate  both  the  /(e.sso 
bilities  and  limitations  of  computer  algebra  and  should  emphasize  the  role 
of  computer  algebra  in  the  problem  solving  process.  Tin;  message  of  most 
written  material  currently  available  on  computer  symbolic  com|.ut.ition 
tries  to  promote  the  held  rather  than  give  a  balanced,  realistic  view  of 
where  a  CAS  can  be  used.  In  solving  malhetnatical  j^roblems,  a  CAS  is 
only  of  many  tools  that  can  help  solve  a  problem.  The  important  qin;stion 
is,  what  is  the  place  for  computer  symbolic  computation  in  the  [trolilem 
solving  process?  What  is  the  role  for  the  mathematical  scientist  and  what 
is  the  role  for  the  maciiine? 

-1.  Pencil  and  paper  symbolic  calculation  is  more  important  than  ever.  Occa¬ 
sionally,  it  is  said  that  compul<;r  algebra  will  revoint ionalize  the  w.ay  wi-  d(. 
mathematics  by  eliminating  the  need  for  imich  symbolic  computation  witli 

■^During  the  past  year,  two  new  C.\S  systems,  .\I.AT! lEM ATIC A  (see  Wolfi  Tun[l5j)  anti 
DERIVE{[l2]),  have  been  introflucwi. 
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pencil  and  paper.  We  believe  that  this  statement  is  misleading  and  gives 
a  false  picture  of  the  role  that  computer  algebra  can  play  in  the  problem 
solving  process.  A  CAS  can  perform  many  symbolic  calculations  vvliich 
are  ordinarily  done  by  hand.  However,  we  believe  to  effectively  use  a  CAS, 
the  mathematical  scientist  must  be  good  at  pencil  and  paper  symbolic  cal¬ 
culation  and  have  a  good  understanding  of  the  underlying  mathematical 
concepts.  This  understanding  is  essential  to  recognizing  situations  where 
a  CAS  can  be  applied. 

There  is  another  more  subtle  reason  to  emphasize  the  importance  of  hand 
calculation.  For  some  people  there  is  a  tendency  to  immediately  My  a 
computation  with  a  CAS  with  the  hope  that  the  system  will  miraculously 
produce  the  inteiuled  result.  Using  a  CAS  in  this  way  can  be  comitcrpru- 
ductive.  A  more  useful  approach  is  to  spend  some  time  tliinkiiig  about 
a  problem  with  pencil  and  paper.  Perhaps  the  problem  can  be  put  into 
a  more  convenient  form  which  leads  to  a  transparent  solution  or  which 
provides  some  unexpected  information. 

Course  Organization. 

There  are  three  important  components  to  an  applied  computer  algebra  course: 

1.  An  exploration  of  the  capabilities  of  a  CAS. 

2.  A  discussion  of  symbolic  programming  techniques  including  recursion  and 
list  manipulation. 

3.  A  discussion  of  some  of  the  elementary  algoritlims  wliich  make  a  CAS 
work. 

We  shall  discuss  each  component  in  greater  detail  in  the  following  sections. 

4  Exploring  The  Capabilities  of  a  CAS. 

Many  mathematical  scientists  do  not  use  a  CAS  because  they  do  not  under¬ 
stand  its  capabilities  or  believe  it  can  be  useful  in  their  work.  It  is  easy  to 
understand  where  this  feeling  comes  from.  The  manipulation  of  mathematical 
expressions  is  a  dilTicnlt  intellectual  exercise  which  often  requires  insight  as  well 
as  perseverance.  Indeed,  even  those  who  have  consideral^le  ex|Jerience  witli  a 
CAS  may  find  it  difficult  to  describe  what  it  can  do  or  when  it  mighi  be  n.seful 
for  a  particular  problem,  d'liere  are  some  who  even  take  the  point  of  view  that 
it  is  too  difficult  to  precisely  <lefine  what  a  CAS  system  does.  They  believe  a 
user  should  simply  try  a  [iroblem  with  a  CAS  and  see  what  hap|>ens. 

The  mathematical  knowledge  in  a  CAS  is  defined  by  the  properties  of  the 
various  mathematical  operators  in  the  system  (expand,  factor,  limit,  dilferenti- 
ate,  integrate  etc.)  and  the  automatic  simplification  rules  which  are  applied  to 
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an  expression.  The  capabilities  of  most  mathematical  operators  can  vary  from 
system  to  system  (sometimes  dramatically)  and  may  change  significantly  when 
a  new  version  of  a  system  is  introduced.  For  example,  some  CAS  systems  have 
the  capability  to  compute  the  limit  of  a  function  or  a  sequence.  In  the  manual 
which  accompanies  the  CAS,  the  semantic  action  of  the  limit  operator  is  usually 
loosely  described  rather  than  precisely  defined.  This  description  will  include  a 
few  isolated  examples  of  how  the  operator  is  applied  but  little  information  about 
what  to  expect  in  non-trivial  examples.  If  one  were  to  scan  through  a  typical 
text  on  applied  mathematics,  one  would  find  that  the  limit  operation  is  u.sed 
in  many  different  contexts  —  some  very  specific  and  some  quite  abstract.  For 
which  limit  operations  should  we  expect  that  a  CAS  will  produce  a  reasonable 
result? 

Fig.  1  shows  the  results  of  applying  two  computer  algebra  systems  ■*  to  some 
limit  problems  encountered  in  undergraduate  mathematics.  Fxatiiple  1  recinires 
two  applications  of  L'hospitai’s  rule  and  is  easily  computed  by  both  systems. 
Example  2  is  similar  but  requires  n  applications  of  L’hospital’s  rule  where  ii  is 
undefined.  Nevertheless,  both  systems  are  able  to  compute  the  result.  However, 
MACSYMA  requested  additional  information  about  x  and  n  which  in  this  case  is 
extraneous.  The  limit  in  Example  3  requires  another  approach.  The  expression 
is  the  absolute  value  of  the  nth  term  of  the  series 


which  converges  by  the  ratio  test.  Therefore, 

lim  —  =  0. 

x-— oo  n! 

Neither  system  was  able  to  compute  this  limit.  Example  d  is  the  formal  defi¬ 
nition  of  the  derivative  for  sinr.  Both  systems  are  able  to  compute  this  limit 
but  MACSYMA  requested  additional  but  extraneous  information  about  the  sin 
function.  Example  5  is  a  similar  calculation  (the  transformation  s  =  \/t  trans¬ 
forms  the  limit  to  the  derivative  of  exp(a;)  at  r  =  0)  but  surprisingly  M.\C- 
SYMA  is  unable  to  compute  the  limit.  Example  6  is  the  general  defmition  of 
the  derivative.  However,  neither  system  recognizes  this  fact  even  though  both 
systems  have  some  capability  to  work  with  undefined  functions.  Example  7 
is  a  famous  result  due  to  Euler.  The  author  was  quite  surprised  to  find  that 
MAPLE  can  compute  this  limit.  Example  8  is  the  Laplace  transform  ofsiiiu/t. 
.MACSYMA  is  able  to  compute  this  limit  by  requesting  information  about  the 
sign  of  s.  MAPLE  was  unable  to  compute  the  limit. 

What  is  a  user  to  make  of  this  performance?  Certainly  all  these  limits  may 
arise  in  the  course  of  doing  mathematics  aiul  are  fair  recpiests  of  a  (.'AS,  If  the 

^The  systems  are  MACSYMA  309.6  anti  MAPLE  -1.1.  VIACSYMA  is  a  Uatleiimrii  of 
Symbolics  Inc. 
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Figure  1:  Examples  of  limil  operations  in  two  coini)uter  algebra  syslems.  'I'lie 
symbol  U  means  "unable  to  compute.” 


person  is  familiar  with  the  computer  algebra  field  and  has  kept  up  with  the  work 
on  limits,  he  might  know  about  the  difficulties  which  arise  when  programming 
a  CAS  to  compute  limits,  and  thus  thus  be  willing  to  excuse  the  system  when 
it  fails  to  produce  a  result.  Most  users  will  not  have  this  knowledge  ainl  may 
end  up  wondering  just  what  a  system  can  do  and  whether  to  trust  the  results. 
To  effectively  use  a  CAS,  a  user  must  have  a  clear  idea  about  the  semai\tic. 
capabilities  of  mathematical  operators  in  the  system.  The  exploration  of  the 
semantic  capabilities  of  one  or  tnore  CAS  systems  is  an  important  compuiieiit 
of  an  applied  computer  algebra  course. 

VVe  use  the  phrase  seviaulic  capacity  to  refer  to  the  mathematical  power 
of  an  operator  in  a  computer  algebra  system.  In  many  cases  it  is  dillicull  to 
precisely  define  the  semantic  capacity  since  even  the  develo|>er  of  a  CAS  may 
not  know  exactly  what  an  operator  can  and  cannot  do.  Nevertheless,  we  believe 
that  this  is  an  important  question  and  it  should  be  discussed  even  if  it  cannot 
be  completely  answered. 

W'e  do  not  believe  it  is  practical  or  interesting  to  simply  list  in  detail  the 
capabilities  of  an  operator  in  a  CAS.  The  capacity  of  an  operator  may  change 
when  new  versions  of  a  CAS  are  released  and  it  is  difficult  for  a  casual  user  to 
keep  details  of  this  sort  in  mind.  An  alternative  approach  is  to  discuss  what 
a  mathematical  operator  should  ideally  be  able  to  do  ainl  compare  this  in  a 
general  way  to  what  current  syst<Mu.s  are  capable  of  doing.  I'or  example,  consi<ler 
the  differentiation  operator.  Obviously,  a  CAS  should  bi.'  able  to  rlilfiMi.ait  iaie 
most  specific  functions  no  matter  how  complicated.  However,  in  mathematical 
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calculation  we  also  difTerentiate  im;jlicit  functions,  functions  defined  by  integrals 
(including  differentiation  under  the  integral  sign),  functions  defined  by  infinite 
series,  and  undefined  functions  (general  f(x),  y(x,y)).  We  differentiate  with 
respect  to  a  variable,  a  function  and  even  a  differentiation  symbol  such  as  y'. 
We  compute  ordinary  derivatives,  partial  derivatives  and  total  derivatives.  Is  a 
CAS  able  to  perform  the  differentiation  operation  correctly  in  all  these  cases? 
A  carefully  chosen  list  of  exercises  can  help  a  user  explore  the  semantic  capacity 
of  an  operator. 

Properly  Posed  Requests. 

As  with  all  forms  of  programming,  the  mathematical  scientist  should  take 
care  to  ensure  that  the  input  to  the  system  makes  mathematical  and  compu¬ 
tational  sense.  Informally  speaking,  an  operation  is  said  to  be  properly  posed 
if: 


•  The  execution  of  the  operation  produces  a  meantntjful  mathematical  ex¬ 
pression.  It  is  quite  easy  to  get  a  CAS  to  return  absurd  looking  results. 
Therefore,  the  mathematical  scientist  must  always  carefully  inspect  the 
result  returned  by  the  system  to  ensure  that  it  makes  mathematical  sense 
and  satisfies  all  the  explicit  (and  implicit)  assumptions  in  a  problem. 

•  The  CAS  has  all  the  information  needed  to  perform  the  operation  in  an 
uiiambiijuous  manner.  A  CAS  is  sometimes  asked  to  perform  a  mathe¬ 
matical  operation  without  all  the  necessary  information.  In  this  case,  tin; 
system  may  not  perform  the  operation,  may  request  adilitional  informa¬ 
tion,  or  may  even  returti  a  result  which  is  not  entirely  correct.  I'o  obtain 
a  satisfactory  result,  the  mathematical  scientist  must  supply  adilitional 
information  to  the  system  or  modify  the  rei|uest  to  remove  the  ambiguity. 

Fig.  2  is  a  MACSYMA  session  wliich  illustrates  a  few  instances  of  operations 
which  are  not  properly  po.sed.  Line  cl  assigns  an  equation  to  a  variable  dl. 
Line  c2  substitutes  the  value  x  =  — 3  into  dl.  In  response,  MACSYMA  returns 
the  expression  d2.  We  consider  this  substitution  operation  to  be  impro|>erly 
posed  since  it  returns  an  absurd  expression  even  though  it  is  perfectly  legal  in 
MACSYMA.  The  problem  here  is  with  the  ambiguous  use  of  the  equal  sign, 
which  has  a  number  of  different  uses  in  mathematics  and  therefore  a  number  of 
different  uses  in  a  CAS.  In  the  pre.sent  context,  no  semantic  meaning  is  assigned 
to  the  equality  and  the  system  is  not  aware  that  the  last  result  is  incorrect.  A 
similar  computation  can  by  done  in  both  .MAPLE  and  muMath.  The  REDUCE 
system  does  not  accept  the  substitution  and  returns  an  uninformative  error 
message. 

In  statement  c3  we  request  the  system  to  evaluate  the  indefinite  integral 

/  x'^dx 


687 


(cl)  x-2+4=x-l: 


2 

(dl)  X  +  4  =  X  -  1 

(c2)  subst(x=-3,dl) ; 

(d2)  13  =  -  4 

(c3)  integrate(x*n,x) : 

Is  n  +  1  zero  or  nonzero? 
nonzero ; 

n  +  1 


X 


(d3) 


n  +  1 

(c4)  integrate(sin(omega*t)*exp(-s*t) ,t,0,inl) ; 
Is  omega  positive,  negative,  or  zero? 
positive; 

Is  s  positive,  negative,  or  zero? 
positive; 


(d4) 


omega 


2  2 


s  +  omega 


Figure  2:  A  MACSYMA  session  demonstrating  statements  that  are  tmiirojierti/ 
posed. 


688 


wiiere  n  does  not  depend  on  x  but  is  otherwise  undefined.  This  statinent  is 
improperly  posed  since  the  result  depends  on  whether  or  not  n  =  —  I,  wliicli 
is  unknown  at  tliis  point.  In  this  case,  MACSYMA  queries  tlie  user  for  more 
information  about  the  value  of  n.  We  have  informed  the  system  that  n  +  1  ^  0 
and  the  system  returns  the  appropriate  result  d3.  The  computation  was  tried 
on  three  other  computer  algebra  systems  (muMath,  MAPLE  and  REDUCE). 
Each  automatically  assumed  n  ^  —  1  and  returned  i"/(n  +  1). 

In  the  next  statement  (c4),  we  ask  MACSYMA  to  evaluate  tlie  integral 

sinu»< 

This  improper  integral  represents  the  Laplace  transform  ol  the  function  sin^t. 
It  converges  when  s  >  0  and  otherwise  does  not  converge.  Without  informa¬ 
tion  about  the  relationship  between  s  and  zero,  the  statement  is  improperly 
posed.  MACSYMA  realizes  this  fact  and  requests  more  information  about  s. 
MACSYMA  also  requests  unnecessary  information  about  the  sign  of  urneya.  In 
both  cases,  we  indicate  that  the  variables  are  positive  and  MACSYMA  returns 
the  result  (d4).  The  muMath,  MAPLE  and  REDUCE  systems  are  unable  to 
evaluate  this  improper  integral. 

The  question  of  when  an  operation  is  properly  posed  is  an  important  aspect 
of  operator  capacity.  It  is  important  to  understand  that  a  CAS  must  occasionally 
make  assumptions  about  the  nature  of  variables  in  an  expression  and  that  the 
result  produced  by  a  system  may  not  be  correct  in  all  situations. 

Simplification  Context. 

For  efficiency  reasons  it  is  unreasonable  to  expect  a  CAS  to  apply  all  its 
simplification  rules  during  the  course  of  a  computation.  The  designer  of  a  CAS 
must  choose  which  simplification  rules  are  appropriate  for  a  particular  operation. 
We  use  the  term  simplificadon  context  to  refer  to  those  simplification  rules  which 
are  applied  during  the  evaluation  of  a  mathematical  operator.  The  simplification 
context  often  determines  the  form  of  the  output  of  an  operator  and  in  some  ca.ses 
determines  whether  or  not  a  CAS  can  even  perform  an  operation. 

For  a  simple  example,  consider  the  MAPLE  session  in  Fig.  3.  At  the  first 
prompt,  u  is  assigned  a  polynomial  in  x  with  coelficients  which  are  polynomials 
in  a.  At  the  second  prompt,  we  request  that  MAPLE  determine  the  degree  of 
u  in  X.  The  system  returns  tlie  value  2  even  though  the  the  coellicient  of  x- 
simplifies  to  0.  In  this  example,  the  expand  simplification  rules  apparently  are 
not  part  of  the  simplification  context  of  the  degree  operator. 

The  simplification  context  will  often  determine  whether  or  not  a  CAS  is  aide 
to  perform  an  operation.  For  example,  consider  the  indefinite  integration 

2xcos(x~  +  j:)  -f  cos(x'  4-  x)  dx  =  siri{x'  -t-  x)  +  C.  (5) 
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>  u  :=  (a"2-l-(a+l)*(a-l))*x'2  +  2*x  +  3; 


2  2 
u  :=  (a  -  1  -  (a  -  l)(a  -  1))  x  +  2  x  +  3 


>  degree(u,x) ; 


2 


Figure  3:  A  MAPLE  session  demonstrating  a  siinplificaiiun  context. 


If  the  integrand  is  first  factored,  the  integral  can  be  easily  evaluated  by  making 
the  substitution  u  =  x-  +  x.  In  factored  form,  the  four  systems  available  to  the 
author  (MACSYMA,  MAPLE,  REDUCE  and  miiMath)  are  able  to  evaluate 
the  integral.  However,  in  the  form  (5),  only  REDUti^E  was  able  to  evaluate  the 
integral.  Apparently,  factoriiiation  is  part  of  the  simplification  conte.xt  of  the 
integration  operator  in  REDUCE  but  not  part  of  the  simiilification  conte.\t  of 
the  operator  in  the  other  three  systems. 

Experiments  with  these  computer  algebra  systems  have  shown  that  it  can  be 
diincult  to  determine  exactly  which  simplification  rules  are  applied  during  the 
the  evaluation  of  an  operator  and  at  which  point  of  the  computation  the  rules 
are  applied.  Nevertheless,  the  simplification  context  is  an  important  aspect  of 
operator  capacity  and  should  be  raised  as  an  issue  even  if  it  can  not  be  evaluated 
precisely. 

Simplification  of  Mathematical  Expressions  With  A  CAS. 

An  important  application  of  computer  algebra  is  the  simplification  of  in¬ 
volved  mathematical  expressions.  It  is  easy  to  give  examples  which  are  dilliciilt 
to  simplify  by  haiul  but  can  be  routinely  simplified  with  a  CA.S(see  [11]).  Un¬ 
fortunately,  it  is  also  easy  to  find  simplifications  which  a  CAS  is  unable  to  do. 
For  most  people,  this  is  not  particularly  surprising  since  simplification  with 
pencil  and  paper  often  requires  considerable  insiglit,  clever  substitutions  and 
application  of  involved  identities. 

Altliougli  many  simplifications  involve  concepts  which  most  tiiathematical 
scientists  consider  elementary  (factorization,  expansion,  trigonometric  identities 
etc.),  the  simplification  process  is  computationally  quite  complex.  In  fact,  it  is 
possible  to  show  that  it  is  theoretically  impossible  to  find  an  algorithm  that  can 
perform  all  the  simplifications  we  might  hope  to  do  with  a  CAS  To  complicate 
matters,  the  goal  of  simplification  is  difficult  to  define  precisely.  What  is  simple 

the  language  of  cuinpulcr  science,  (tie  siinpliric.aliuii  prublcin  Is  leciiisivcly  iiiKlecii laMe. 
For  a  discussion  of  tliis  tlieorem  see  llie  paper  by  Moses[9].  This  paper  coiilains  an  inleresling 
discussion  of  the  simplification  problem. 
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to  one  person  may  not  be  simple  to  another. 

For  a  mathematical  scientist  the  important  question  is  “which  simplilications 
are  possible  with  CAS?”  This  question,  which  is  not  an  easy  one  to  answer, 
helps  put  in  context  the  role  of  a  CAS  in  the  problem  solving  process.  While 
a  CAS  usually  cannot  iliscover  the  sequence  of  steps  needed  for  an  involved 
derivation  of  a  mathematical  result,  it  can  do  many  of  the  local  simplifications 
which  are  encountered  in  the  course  the  derivation.  By  combining  a  number  of 
simplification  commands,  it  is  sometimes  possible  to  write  a  program  to  verify 
all  the  details  of  an  involved  derivation. 

Generally  speaking,  a  CAS  is  good  at  simplifying  expressions  that  contain  ex¬ 
plicit  forms  of  elementary  functions  and  involve  complicated  but  straightforward 
manipulations.  Most  system^s  contain  commands  for  expansion,  some  forms  of 
substitution,  simplification  of  rational  expressions,  radical  simplification  aiul 
transcendental  simplification.  If  the  simplification  is  not  straightforward  these 
systems  are  less  useful.  For  example,  none  of  the  systems  available  to  the  author 
is  able  to  perform  the  simplification 

IT  T  T 

logtan(x  -f  — )  -  arcsinh(tan 2x)  =  0,  --j  <  x  <  -j- 

which  requires  a  number  of  different  transformations.  In  adtiitiou,  simplification 
of  expressions  which  contain  indefinite  sums  or  series  (using  the  ^  symbol  or 
ellipses)  and  other  expressions  of  a  more  abstract  character  are  usually  not 
possible  with  a  CAS.  For  example,  the  MACSYMA  system  is  able  to  solve  the 
Bessel  differential  equation 

x-y" -t- xy' -f- (x- -  p-)j/ =  0  (G) 

in  terms  of  the  Bessel  functions  J,,(x)  and  lp(x)  or  in  terms  of  infinite  series 
representations  or  these  functions  but  is  unable  to  perform  the  simplification 
which  shows  that  the  series 


CO 

h>{x)  = 


t=0 


(-l)t(|)2t  +  , 

k\{k+vY 


(7) 


satisfies  differential  equation  (6)  ®. 

Each  new  user  brings  to  a  CAS  a  conceptual  model  of  simplification  which 
is  based  on  experience  with  pencil  and  paper  calculations.  For  many  people  this 

’’One  must  always  be  careful  wlien  nicikiiig  a  claim  that  a  particular  C.\S  system  is  iiii.ililc 
to  carry  out  a  certain  contpulatioii.  Computer  algebra  sysleiius  aie  coinplicaied 
with  many  liurulrecls  of  cumiiianiU  and  it  is  tliincult  for  a  person  to  know  exactly  wltal  a  CAS 
caji  and  cannot  do.  When  we  say  a  CAS  cannot  peifonii  a  siinpiiilcatioii  we  mean  ttu*  autltor 
was  iitiabte  to  find  one  or  two  general  purpose  cointnands  which  are  able  lo  peiTorin  this 
simplification.  Of  course,  it  is  possible  to  write  a  program  with  a  large  iiiimUM'  of 
to  perform  this  particular  simplincation.  However,  if  one  must  go  to  all  tiiis  iruul^lo,  ilie 
simplification  inigfit  Just  as  well  be  done  with  pencil  and  paper. 


691 


mode)  may  diverge  radically  from  what  is  currently  possible  with  a  CAS.  For  ex¬ 
ample,  some  mathematical  scientists  may  consider  the  simplilication  mentioned 
above  which  involves  the  series  (7)  fair  game  for  a  CAS  and  wonder  why  such 
a  simple  manipulation  cannot  be  done.  Indeed,  glancing  through  an  applied 
mathematics  textbook,  one  finds  many  manipulations  involving  more  general 
expressions  of  this  type  which  also  cannot  be  done  with  a  CAS. 

In  order  to  develop  confidence  in  the  capabilities  of  a  CAS,  it  is  important 
to  realistically  assess  what  types  of  simplifications  a  CAS  is  able  to  do.  A  good 
starting  point  for  selecting  examples  are  the  symbolic  calculations  fouiul  m  ele¬ 
mentary  textbooks  on  trigonometry,  algebra,  calculus  and  differential  equations 
^ .  Clearly,  not  all  of  these  manipulations  are  appropriate  for  a  CAS.  By  present¬ 
ing  these  examples,  we  examine  the  types  of  manipulations  a  CAS  is  li!:ely  to 
encounter  and  raise  important  questions  about  the  capabilities  of  these  systems. 
In  addition,  we  develop  a  connection  between  hand  calculation  and  calculation 
with  a  CAS. 

The  Natui’e  of  Mathematical  Knowledge  in  a  CAS 

The  view  of  mathematics  programmed  into  current  computer  algebra  sys¬ 
tems  is  reminiscent  of  the  approach  taken  by  mathematicians  in  the  eighleentli 
century  *.  Like  the  mathematicians  of  that  time,  a  CAS  views  the  concepts 
of  calculus  primarily  as  an  extension  of  the  formal  rules  of  algebra.  Computer 
algebra  systems  have  almost  no  knowledge  about  the  underlying  concepts  of 
calculus  such  as  rational  and  irrational  numbers,  the  meaning  of  limits,  the 
continuity  of  a  function,  the  ilerivative  of  a  function  in  terms  of  limits,  the  rela¬ 
tionship  between  integrals  and  areas,  or  the  convergence  of  infinite  series.  For 
example,  in  a  CAS,  the  derivative  is  defined  by  a  collection  of  transformation 
rules  instead  of  with  the  limit  definition 

For  the  most  part,  this  lack  of  analytical  knowledge  does  not  ham|>er  the  u.se 
of  a  CAS.  However,  it  does  mean  a  CAS  can  occasionally  produce  an  unexpected 
result.  For  example,  consider  the  MACSYMA  session  in  Fig.  d.  At  line  cl, 
we  ask  MACSYMA  to  compute  the  power  series  representation  for  the  expres¬ 
sion  1/(1  -1-  x).  The  result  is  returned  in  dl  where  the  subscript  "il"  has  been 
generated  by  the  system  for  this  expression.  At  line  c2,  we  ask  MACSYMA  to 
substitute  1  =  1  into  both  sides  of  the  expression  and  simplify  the  sum.  MAC- 

'  K(iulli[7]  illustrates  some  of  the  ilitrerenl  types  of  nianipulatiuiis  found  in  inatlieinalical 
refisoiiiiig. 

*'lliis  refei's  to  llie  user’s  view.  Many  of  Hie  aigorilhiiis  wliii.li  make  .i  wmk  an: 

part  of  twentieth  ceiitiu’y  inatlieiiiatics  aiiU  computer  science.  Nevertheless,  ihe  i>nmaiy 
applications  of  symbolic  memipniation  systents  are  to  niatheniatics  cleveloperi  in  the  eigliieeiitli 
ami  nineteenth  centuries. 

^However,  it  is  interesting  to  noCe.  that  it  is  possible  to  fiml  the  ilerivative  tif  fimciitJiis 

in  a  CAS  using  the  limit  definition.  1  lits  is  not  because  the  system  lias  an  iiiKlersiaiKliiig  o(  ilie 
limit  process.  It  follows  instead  from  tlie  traiisfoiiiiation  rules  for  limits  wliiclt  use  L'liosiiiial’s 
rule  or  a  similar  constmetion  using  derivatives  for  the  calculation  of  limits  involving  indeter¬ 
minate  forms. 
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(cl)  l/(l+x)=powerseries(l/(l+x) ,x.O) ; 

ini 

1  \  il  il 

(dl)  - -  >  (-1)  X 

X  +  1  / 


il  =  0 


(c2)  ev(dl ,x=l,simpsum) ; 


(d2) 


=  undefined 


(c3)  log(l+x)=poHerserie3(log(l+x) ,x,0) ; 

inf 

====  i2  i2 

\  (-  1)  X 

(d3)  log(x  +  1)  =  -  >  - 

/  i2 

i2  =  1 


(c4)  ev(d3,x=2,simpsum) ; 


(d4) 


inf 

====  i2  i2 

\  (-  1)  2 

log(3)  =  -  >  - 

/  i2 


i2  =  1 


Figure  4:  Convergence  of  series  in  MACSYMA. 


693 


SYMA  tries  to  evaluate  the  series  and  realizes  that  it  does  not  coiiverge(d2). 
At  line  c3,  we  ask  MACSYMA  to  find  the  power  series  representation  for  the 
function  log(l  +  a:).  The  result  is  returned  in  line  d3  where  the  subscript  “i'2” 
has  been  generated  by  the  system.  At  line  c4,  we  ask  MACSYMA  to  evaluate 
the  series  at  x  =  2  which  is  outside  the  interval  of  convergence.  Since  MAC¬ 
SYMA  cannot  evaluate  this  series  it  simply  returns  the  series  as  a  result.  In 
this  case,  the  series  does  not  converge  since  the  general  term  of  the  series  does 
not  converge  to  zero.  Unfortunately,  MACSYMA  does  not  recognize  this  fact 
and  returns  a  divergent  series.  This  example  emphasizes  that  mathematical 
reasoning  is  not  just  a  matter  of  blind  manipulation.  Successful  use  of  a  CAS 
requires  a  good  understanding  of  the  underlying  mathematics. 

5  Symbolic  Programming. 

Computer  algebra  systems  can  be  used  in  both  an  interactive  mode  and  a  pro¬ 
gramming  mode.  The  interactive  mode  is  illustrated  by  the  e.xamples  in  Fig.  2, 
Fig.  3  and  Fig.  4.  The  programming  mode  makes  it  possible  to  implement 
mathematical  algorithms  in  a  high  level  programming  language 

Like  numerical  computer  programs  ",  programs  in  a  CAS  language  utilize 
assignment  statements,  conditional  statements,  loops  and  subprograms.  Since 
there  are  differences  between  numerical  programming  and  symbolic  program¬ 
ming,  it  often  takes  time  for  a  numerical  programmer  to  feel  comfortable  with 
symbolic  programming.  These  differences  include: 

•  In  a  CAS  language,  variables  can  represent  programming  variables  or 
mathematical  variables  in  an  expression. 

•  The  primary  data  type  in  symbolic  programming  is  the  mathematical 
expression.  The  two  most  important  data  structures  are  lists  and  sets. 
Arrays  are  also  used  in  symbolic  programming  but  they  do  not  play  the 
essential  role  they  play  in  numerical  programming. 

•  Symbolic  programs  can  utilize  mathematical  knowledge  by  invoking  math¬ 
ematical  operators  whicli  analyze  or  manipulate  mathematical  expressions. 

•  Recursive  programming  techniques  are  often  utilized  (instead  of  loops) 
to  solve  symbolic  mathematical  problems.  Many  mathematical  scientists 
(particularly  FOIH'RAN  programmers)  may  not  be  familiar  with  recur¬ 
sion. 

*'^Some  computer  algebra  aysteiiis  (MAPbK  aud  inuMalli)  are  wiilteii  primarily  in  llic  bigli 
level  computer  language  wliicli  comes  with  the  system. 

"We  refer  to  programs  written  in  languages  such  as  FORTRAN,  Rasc.d  oi  C  as  mmiepical 
programs  to  distinguish  them  from  programs  written  in  the  high  level  language  of  a  C'.A.S.  Of 
course,  these  “numerical”  languages  have  other  data  types  and  can  he  used  for  oilier  types  of 
programming.  For  the  solution  of  mathematical  pixtblems,  it  is  there  mimerical  capabilities 
which  are  most  important. 
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•  Some  computer  algebra  systems  (MACSYMA,  REDUCE,  and  MATllE- 
MATICA)  have  pattern  matching  facilities  which  provide  a  way  to  add 
new  simplification  rules  to  a  CAS.  In  some  instances,  this  capability  can 
eliminate  the  need  for  involved  programs  based  on  conditional  statements, 
loops  and  recursion. 

•  In  mathematical  discussions  there  is  a  subtle  distinction  between  the  way 
the  variable  assignment  and  substitution  operations  are  used  It  is  also 
important  to  determine  when  each  of  these  two  operations  is  appiopiiale 
in  a  symbolic  program. 

•  Program  efficiency  (for  CPU  time  and  memory  allocation)  is  an  importani 
issue  for  the  symbolic  programmer.  Programs  written  in  a  CAS  language 
can  be  unbearably  slow.  Reasons  for  this  include: 

1.  CAS  languages  are  interactive  rather  than  compiled.  Each  statement 
in  a  program  must  be  translated  each  time  it  is  used 

2.  Most  arithmetic  in  a  CAS  is  done  with  rational  numbers,  which  can 
have  an  arbitrary  number  of  digits,  rather  than  real  numbers,  wliitli 
have  a  fi.xed  precision.  This  increases  the  CPU  time  for  arithmetical 
operations. 

3.  The  algoritlmus  to  perform  some  mathematical  i.peratioiis  (imuis, 
integration,  solution  of  ordinary  ilillercntial  ec|ualions,  radical  sim¬ 
plification,  etc.)  are  time  consuming. 

•1.  Automatic  simplification  rules  are  applied  during  the  execution  of 
each  statement  in  a  program. 

Programs  in  a  CAS  language  can  also  require  a  large  amount  of  conquuer 
memory.  Reasons  for  this  include: 

1.  The  storage  of  a  mathematical  expression  requires  much  more  com¬ 
puter  memory  than  thf*  storage  of  a  real  number  in  a  numerical  pro¬ 
gram. 

2.  Some  mathematical  operations  (expansion,  diirercni iation,  determi¬ 
nant  calculation)  often  jiroriuce  very  large  expressions  winch  may 
eventually  simplify  to  much  smaller  expressions,  d  ins  phenomena, 
known  as  intermediate  expression  swell,  can  signilicantly  increase  the 
memory  re<iuirements  for  a  program. 

Programming  examples  and  e.xercises  are  chosen  to  illustrate  the  programming 
technifiues  and  data  structures  needed  to  implement  symbolic  mathematical 
algorithms  in  a  CAS  language.  A  good  source  for  programming  excu'  i.ses  !.•> 
some  of  the  elementary  mathematical  operators  which  already  e.xist  in  a  ( '.AS 
In  the  next  section,  we  suggest  a  collection  of  e.xercises  which  illustr;iie  >oni«-  id 
the  special  problems  associated  with  symiiolic  com|)utatioii. 
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6  Algorithms  For  Symbolic  Computation. 

To  efTectively  use  a  CAS,  it  is  iiiiportatit  to  liave  some  iiiulerstainJiiif;  of  liow 
a  CAS  works.  By  examining  atgorilliins  to  perform  symbolic  computation,  we 
clarify  tlie  important  computational  issues  faced  by  the  field  and  develop  a 
sense  of  what  manipulations  are  appropriate  for  a  machine.  Since  the  audience 
is  comimsed  of  mathematical  scientists  who  will  use  a  (?AS  to  solve  problems 
rather  than  developers  who  design  computer  algebra  systems,  simide  convincing 
algorithim  are  more  appropriate  than  the  most  elficient  algorithms  currenilv 
available.  The  algorithms  for  the  following  operators  illustrate  many  issues 
which  arise  in  symbolic  computation; 

1.  An  operator  freeof(f,x)  which  rietermines  if  an  expression  /  contains  a 
variable  x. 

2.  An  operator  to  find  a  list  of  all  variables  and  function  names  which  occur 
in  a  mathematical  expression. 

3.  An  operator  to  find  the  power  set  of  a  set. 

■1.  An  operator  which  determines  if  an  expression  /  is  a  polynomial 

in  X,  and,  if  so,  returns  the  degree  of  the  polynomial. 

.5.  An  operator  coe/Jicten({f,xjt}  which  determines  the  coellicient  of  x’*  in  a 
polynomial  /. 

6.  An  operator  to  perform  polynomial  division  for  polynomials  with  one  or 
several  variables. 

7.  An  operator  to  compute  the  greatest  common  divisor  of  two  polynomials 
with  one  or  several  variables  using  Euclid’s  algorithm. 

8.  An  operator  to  find  the  square  free  factorization  of  a  polyinuinal 

9.  An  operator  to  find  the  partial  fraction  expansion  of  a  ral  ional  expression 

10.  An  operator  to  factor  polynomials  with  one  or  several  variables  with  inte¬ 
ger  coefficients  using  Kronecker’s  algorithm. 

11.  An  operator  to  expand  trigonometric  expressions  using  the  angle  aildiiion 
formulas  and  the  multiple  angle  formulas 

12.  An  operator  to  reduce  trigonometric  expressions  using  the  reduction  fu- 
mulas  for  products  anil  powers  of  trigonometric  expressions. 

13.  An  operator  to  simplify  ail  occurrences  of  i"  (t*  =  —  1,  n  an  integer)  in  ;in 
expression. 
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14.  An  operator  to  implement  tlie  rational  substitution  operation  found  in  tlie 
MACSYMA  system. 

15.  A  differentiation  operator. 

16.  An  integration  operator  wliicli  applies  the  substitution  method  to  perform 
integration. 

17.  All  operator  which  iletermiiies  the  real  and  imaginary  parts  of  a  comple.'c 
e.'cpression. 

.Most  of  these  examples  involve  relatively  simple  symbolic  programming  tech¬ 
niques  and  none  of  them  involves  advanced  mat  hematics.  Ni’vi  i  i  heless,  a  pro¬ 
gram  to  implement  some  of  these  operators  can  be  snrpribingly  cliallenging  lo 
the  beginning  symbolic  programmer. 

We  have  purposely  omitted  from  this  list  the  more  modern  algorillims  whicli 
make  a  CAS  work  efliciently.  This  materia!  requires  a  strong  backgronm!  in 
modern  algebra  and  is  more  appropriate  for  an  advanced  course  in  compuler 
algebra.  We  have  also  purposely  omitted  more  advanced  areas  of  .’qifilied  math¬ 
ematics.  Unless  one  is  tlioroughly  familiar  with  some  area  of  mathematics,  it 
can  be  difficult  to  appreciate  tlie  point  of  an  example.  Once  a  mathematical 
scientist  lias  some  experience  with  symbolic  programming,  the  techni(|ues  can 
be  applied  to  more  advanced  problems. 

7  Conclusion. 

It  is  often  said  that  com[)uter  algebra  systems  have  tlie  potential  to  revolutionize 
the  way  we  do  symbolic  mathematics.  Nevertheless,  to  date,  only  a  fraction  uf 
the  mathematical  scientists  who  could  profit  by  using  this  technology  use  a  C.V.S 
in  a  significant  way.  We  believe  that  this  is  partially  due  to  the  fact  that  many 
mathematical  scientists  do  not  understand  the  possibilities  (and  limitation.'.)  of 
computer  symbolic  computation  and  do  not  believe  a  CA.S  can  help  in  I  hen- 
work.  We  also  believe  there  is  more  to  the  effective  use  of  a  U'AS  than  reading 
a  system  manual  and  seeing  a  few  impressive  examples.  In  this  jiapi'r  wi-  ha\f 
discussed  an  applieil  computer  algebra  course  which  can  provide  the  technical 
background  to  effectively  use  this  technology. 
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ABSTRACT  Groebner  bases  are  remarkable  sets  of  polynomials  which  permit  effec¬ 
tive  manipulation  of  multivariate  polynomials.  In  spirit,  Groebner  bases  apply  univariate 
polynomial  techniques  to  multivariate  polynomials.  The  theory  and  techniques  which 
have  grown  up  about  Groebner  bases  are  an  important  branch  of  computational  al¬ 
gebra.  While  many  of  the  techniques  associated  with  Groebner  bases  are  simple 
enough  to  be  taught  in  high  school,  an  undergraduate  abstract  algebra  course  is  re¬ 
quired  to  begin  appreciatiating  the  algebra  applications.  Outside  of  algebra,  Groebner 
bases  have  application  to  robotics,  computational  geometry,  geometric  theorem  proving 
and  other  areas.  Application  to  surface  modeling  and  cryptography  are  under  investiga¬ 
tion.  Groebner  bases  are  remarkably  poorly  known  within  the  algebra  research  com¬ 
munity.  This,  despite  the  fact  the  associated  algorithms  are  high  school  algebra  [1]  yet 
provide  systematic  answers  to  important  questions  which  most  algebraists  have  no 
other  way  to  answer. 

INTRODUCTION  Groebner  bases  are  the  invention  of  Bruno  Buchberger,  [9].  The 
present  importance  of  Groebner  bases  results  from  the  conjunction  of  Buchberger’s 
seminal  work  together  with  the  body  of  techniques  which  have  developed  around  his 
work.  Buchberger  theory  is  an  appropriate  name  for  the  area.  In  the  same  way 
Galois  theory  refers  to  a  body  of  techniques.''  At  present,  pure  mathematicians 
primarily  use  Groebner  bases  to  compute  examples.  Inevitably,  Buchberger  theory,  like 
Galois  theory,  will  be  freely  used  in  proofs.2 

The  four  cornerstones  of  Buchberger  theory  are: 

LEADING  TERMS  REDUCTION 

BASIS  TEST  BASIS  CONSTRUCTION 

The  rest  of  the  introduction  airs  algebra  applications  of  Buchberger  theory.  Those  un¬ 
familiar  with  the  concepts  may  still  understand  the  sections:  1  LEADING  TERMS, 

2  REDUCTION,  3  GROEBNER  BASIS  TEST  and  CONSTRUCTION,  which  are 


^  PROPAGANDA:  Buchberger  theory  is  as  fundamental  and  more  elementary  than 
Galois  theory  and  should  be  taught  in  advanced  undergraduate  algebra  courses  and  the 
first  year  graduate  algebra  course. 

2which  gets  us  to  Hironaka  theory.  Ideal  bases  with  special  properties  are  not  new. 
Among  others,  there  are  Ritt  bases  [50],  [51]  and  standard  bases  [23]  which  are  already 
used  in  proofs. 
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honestly  elementary.  The  last  harangue;  4  WHERE  THE  ACTION  ISN’T,  passionately 
portrays  the  prevailing  pitiful,  paltry  position  of  constructive  algebra  among  North  Amer¬ 
ican  academic  algebraists. 

ALGEBRA  APPLICATIONS  When  using  Groebner  bases,  one  typically  starts  with  a  fi¬ 
nite  set  of  polynomials  -  and  an  ordering  on  the  monomials  of  the  polynomial  ring  -  and 
constructs  a  Groebner  basis  for  the  ideal  generated  by  the  original  polynomials.  One 
customarily  gets  information  from  a  Groebner  basis  by  one  of  two  methods: 

I:  simple  Inspection  of  the  Groebner  basis 

R:  a  constructive  technique  called  Reduction 

Constructing  the  Groebner  basis  is  generally  tedious,  i.e.  computationally  expensive, 
[38].  Reduction  is  much  easier.  Reduction  has  the  flavor  oi  the  Euclidean  algorithm 
and  is  occasionally  described  as:  The  generalization  of  the  Euclidean  algorithm  to 
several  variables.3  Here  are  several  algebra  applications  of  Buchberger  theory.  Each 
application  is  preceded  by  R  for  Reduction  or  I  for  Inspection,  according  to  how  one  gets 
information  from  the  Groebner  basis.  We  use  the  following  notation;  A  =  R[Xi,—,Xn] 
is  a  polynomial  ring  over  the  field  R  ,  a  is  an  element  of  A  and  F  is  a  finite  subset  of 
A  .  <F>  is  the  ideal  in  A  generated  by  F  ,  R[F]  the  subalgebra  of  A  generated  by  F 
and  R(F)  the  subfield  of  R(Xi  ,-,Xn)  generated  by  F . 

R  Determine  if  a  e  <F>  . 

I  Determine  if  a  c  V<F>  ,  the  radical  of  <F>  . 

R  Determine  if  a  e  R[F] . 

I  Determine  if  a  e  R(F) . 

I  Determine  a  generating  set  for  (<F>:a)  =  {  b  e  A  |  ba  e  <F> } . 

I  If  J  is  another  ideal  in  A  with  an  explicit  finite  generating  set,  determine  a  gen¬ 
erating  set  for  <F>  n  J  . 

I  If  1<m<n  and  B  =  R[Xi,—,Xm] .  determine  a  generating  set  for  the  ideal; 

B  n  <F>  ,  in  B  . 

I  Find  the  relations  among  the  elements  of  F . 

I  Determine  [  R(Xi  ,-,Xn) ;  R(F)  ] ,  meaning  the  index  if  algebraic,  the  trans¬ 
cendence  degree,  if  not. 

R  Be  able  to  effectively  work  in  A/<F>  by  having  distinguished  coset  representa¬ 
tives  in  A  for  elements  of  A/<F> .  For  any  element  of  A  be  able  to 
determine  the  distinguished  coset  representative  to  which  it  is  equivalent. 

There  are  aspects  of  Buchberger  theory  which  have  the  spirit  of  construction  lines  in 
plain  geometry.  For  example,  in  most  of  the  above  applications,  one  finds  a  Groebner 
basis  for  a  cleverly  chosen  ideal  in  the  ring;  A-with-additional-indeterminates-adjoined. 
Although  the  techniques  are  elementary,  they  are  tedious,  when  done  by  hand,  for  all 


3See  the  end  of  section  2  for  more  about  this. 
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but  the  smallest  examples.  A  number  of  computer  algebra  systems  --  Macaulay,  MAC- 
SYMA,  MAPLE,  Mathematica,  REDUCE,  Scratchpad  II,  etc.  --  are  capable  of  executing 
various  aspects  of  Buchberger  theory. 

Buchberger  theory  has  many  generalizations,  for  example:  to  free  modules  over  poly¬ 
nomial  rings  [3],  rings  with  suitable  filtrations  or  valuations  [53],  [67],  etc.  Groebner 
bases  for  free  modules  allows  effective  computation  of  syzygies,  free  resolutions,  Hilbert 
functions  and  more. 


1  LEADING  TERMS  Portions  of  Buchberger  theory  are  extensions  of  univariate  poly¬ 
nomial  techniques  to  multivariate  polynomial  rings.  Univariate  polynomials  have  a  natu¬ 
ral  expression  in  terms  of  descending  term  degree.  The  degree  of  the  largest  non-zero 
term  is  the  degree  of  the  polynomial  and  plays  a  key  role  in  univariate  polynomial 
theory.  The  first  difficulty  with  multivariate  polynomials  is  the  lack  of  a  natural  leading 
term.  The  first  cornerstone  of  Buchberger  theory  is  a  method  for  recovering  a  notion  of 
leading  term.  Buchberger’s  method,  which  we  present  here,  involves  orderings  on  the 
monomials  of  the  polynomial  ring.  Specific  orderings  on  sets  of  monomials  have  been 
used  long  before  Buchberger’s  work.  Particularly  the  lexicographic  order.  Buchberger 
isolated  the  needed  properties  of  an  abstract  ordering.^  One  approach  to  generalizing 
Buchberger’s  work  has  been  to  develop  alternative  notions  of  leading  terms  not  based 
on  orderings  of  monomials. 

1 .1  DEFINITION  A  multiplicative  order  on  monomials  of  a  polynomial  ring  is  a  total 
order  on  the  monomials  satisfying: 

1.1. a  1<m  for  all  monomials  m 

1.1  .b  if  mi^m2  then  mim3<m2m3  for  all  monomials  mi, m2, m3 

Lexicographic  order  is  an  easy  example  of  a  multiplicative  order.  In  this  order: 
xayb  —  zc  >  X^Y®  —  if  the  left-most  non-zero  term  of  (a  -  (1  b  -  e,  —,  c  -  f)  is  pos¬ 
itive.  The  reverse  lexicographic  order  -  where  X^Y^  —  Z®  >  X^Y®  -  Z^  if  the  right¬ 
most  non-zero  term  of  (a  -  d,  b  -  e,  —,  c  -  f)  is  negative  -  is  not  a  multiplicative  order. 

Two  other  multiplicative  orders: 

1 .2  Compare  by  total  degree,  break  ties  lexicographically. 

1.3  Compare  by  total  degree,  break  ties  reveree  lexicographically. 

Mathematicians’  usual  initial  reaction  is  that  (1.2)  and  (1.3)  must  give  essentially  the 
same  order,  possibly  after  renaming  the  variables.  However  in  three  variables  X,  Y,  Z  : 

X  >  Y  >  Z  and  XZ  >  y2  in  the  (1 .2)  order 

X>Y>Z  and  XZ  <  y2  in  the  (1.3)  order 

This  essential  difference  cannot  be  renamed  away. ^  In  one  variable,  the  usual  degree 


^Not  being  a  historian,  I  cannot  say  whether  these  properties  had  been  isolated  earlier. 
5a  few  applications  of  Buchberger  theory  require  the  (1 .3)  order. 
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order  is  the  unique  multiplicative  order  on  monomials,  in  general  there  are  an  infinite 
number  of  multiplicative  orders.  Often,  the  application  for  which  one  is  using  Buch- 
berger  theory,  constrains  the  multiplicative  order  which  may  be  used.  The  lexicographic 
order  is  an  easily  implemented  multiplicative  order  for  computer  algebra  systems.  The 
lexicographic  order  is  suitable  for  most,  but  not  all,  applications.  However,  other  orders 
•  in  particular  the  (1 .3)  order  -  generally  require  less  computation  [6]. 

Once  one  has  a  multiplicative  order,  the  univariate  case  may  be  imitated  to  some  de¬ 
gree.  For  example,  polynomials  may  be  written  with  the  monomials  in  descending  or¬ 
der.  The  largest  term  -  with  non-zero  coefficient  -  is  dubbed  the  leading  term  of  the 
polynomial.  The  coefficient  (monomial,  exponent)  of  the  leading  term  of  a  polynomial  is 
called  the  leading  coefficient  (monomial,  exponent)  of  the  polynomial. 

Multiplicative  orders  have  the  important  property  of  being  well  orderings.  Consequent¬ 
ly,  processes  such  as  reduction  halt. 


2  REDUCTION  Let  us  begin  with  polynomials  in  one  variable  and  the  familiar  process 
of  dividing  one  polynomial,  fo(X) ,  by  another,  go(X) .  The  aim  is  to  evolve  the  process 
of  reduction  from  the  process  of  polynomial  division.  Hence,  the  polynomial  division 
process  will  be  considered  in  detail.  Suppose 

fo(X)  =  6X1 7  4.  lower  degree  stuff 

go(X)  =  2X12  +  lower  degree  stuff 

First  step:  6X17/2X12  =  3X5  ;  f.|  (X)  is  defined  as  fo(X)  -  3X5*go(X) .  The  step  is  im¬ 
itated  with  f-|  (X)  and  go(X) ,  assuming  fi{X)  has  degree  at  least  as  large  as  go(X). 
Let  us  tweak  the  division  process.  Suppose  at  the  second  step  go(X)  may  be  replaced 
by  another  polynomial  gi  (X) .  In  other  words,  the  first  step  is  imitated  with  fi  (X)  and 
gi  (X) ,  assuming  fi  (X)  has  degree  at  least  as  large  as  gi  (X) .  Suppose  at  each  step 
the  g#(X)  polynomial  may  be  replaced  by  another  polynomial.  Suppose  S  is  a  given 
set  of  polynomials,  where  at  each  step5  g#(X)  may  be  chosen  as  any  polynomial  in  S  . 
This  process  is  the  reduction  of  fo()0  over  S .  When  must  it  halt?  When  an  fj(X) 
has  been  reached  which  has  smaller  degree  than  all  polynomials  in  S  .  Polynomial 
division  yields  a  remainder  which  is  uniquely  determined  by  the  divisor  and  the  dividend. 
The  example  where  fo(X)  =  X  and  S  =  { X ,  X  +  1  }  shows  that  the  remainder  can 
depend  upon  which  elements  of  S  are  chosen  as  g#(X)’s  .7  In  one  variable,  halting  is 
apparently  based  on  degree.  The  halting  condition  may  be  rephrased:  the  reduction 
process  halts  when  an  fj(X)  has  been  reached  whose  lead  monomial  is  not  divisible 
by  the  lead  monomial  of  any  polynomial  in  S .  The  univariate  reduction  process  is  now 
easily  generalized  to  polynomial  rings  in  several  variables,  with  a  given  multiplicative  or¬ 
der. 


5|ncluding  the  first! 

7Looking  ahead:  when  S  is  a  Groebner  basis,  the  remainder  of  complete  reduction  is 
independent  of  how  the  g#(X)’s  are  chosen  from  S  . 
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Let  A  be  a  polynomial  ring  with  a  given  multiplicative  order.  As  indicated  toward  the 
end  of  the  previous  section,  the  multiplicative  order  allows  us  to  speak  of  the  leading 
term,  leading  coefficient,  leading  monomial,  etc.  of  a  polynomial.  Let  fg  be  a  polyno¬ 
mial  in  A  and  let  S  be  a  subset  of  A .  The  following,  inductively  defined,  process  is 

the  reduction  of  fg  over  S . 

1  If  fj  =  0  ,  halt. 

2  If  the  leading  monomial  of  fj  is  not  divisible  by  the  leading  monomial  of  any 

polynomial  in  S  ,  halt. 

3  If  this  step  has  been  reached^,  there  is  a  polynomial  sj  in  S  whose  lead 

monomial  divides  the  lead  monomial  of  fj .  Find  a  polynomial  qj  where  the 
lead  term  of  qj  times  the  lead  term  of  Sj  equals  the  lead  term  of  fj . 

4  Set  fj+1  =fj-qjSj. 

5  GOTO  step  (1). 

Conventionally  qj  is  chosen  as  the  polynomial  consisting  of  the  single  term  whose 
coefficient  is  the  lead  coefficient  of  fj  divided  by  the  lead  coefficient  of  Sj  and  whose 
monomial  is  the  lead  monomial  of  fj  divided  by  the  lead  monomial  of  sj .  The  coeffi¬ 
cient  division  can  be  performed  because  the  coefficients  lie  in  a  field.®  The  monomial 
division  can  be  performed  by  the  assumption  stated  at  the  start  of  step  (3).  More  elabo¬ 
rate  choices  of  q| ,  may  lead  to  computational  optimization  in  the  reduction  process.  Al¬ 
lowing  general  qj’s  in  the  definition  of  the  reduction  process  has  advantages  for  devel¬ 
oping  the  general  theory.  Further  restrictions  may  always  be  placed  on  qj’s  in  imple¬ 
mentations. 

The  f#’s  which  result  are  called  reductums  of  fg  (over  The  reduction  process 

always  halts.  The  last  fj  reached  is  called  a  final  reductum  of  fg  (over  S  ).  The  final 
reductum  is  the  generalization  of  the  remainder  in  polynomial  division.  As  noted  earlier, 
the  final  reduction  is  not  an  invariant  of  fg  and  S .  If  fj  is  a  reductum  of  fg  over  S 
then  fg  -  fj  lies  in  the  ideal  generated  by  S  . 

Suppose  S  lies  in  an  ideal  I .  It  is  easy  to  show  the  equivalence  of: 

The  lead  monomial  of  each  non-zero  element  of  I  is  divisible  by  the  lead 
monomial  of  some  element  of  S  . 

Each  element  of  I  has  a  reduction  over  S  with  final  reductum  zero. 

For  each  element  of  I  all  reduction  over  S  have  final  reductum  zero. 


^Meaning,  has  been  reached  on  the  current  pass  through  the  algorithm. 

^Another  realm  of  generalizations  of  Buchberger  theory  concerns  weakening  the  re¬ 
quirement  that  the  coefficient  ring  of  the  polynomial  ring  be  a  field. 

1  ONOTATION  conundrum  Across  the  disciplines  of  abstract  algebra,  computa¬ 
tional  algebra  and  computer  algebra  there  is  great  terminology  disparity.  For  example, 
one  author’s  term  is  another  author’s  monomial.  Our  usage  of  reduction  is  more  or 
less  standard.  Our  usage  of  reductum  is  not.  A  polynomial  f  can  be  written: 
lead  term  +  lower  stuff ,  where  lower  stuff  indicates  the  sum  of  terms  other  than  the 
lead  term.  What  we  call  lower  stuff  is  frequently  called  the  reductum  of  f. 
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2.1  DEFINITION  S  is  a  Groebner  basis  for  I  if  the  above  conditions  are  satisfied. 

A  Groebner  basis  for  an  ideal  generates  the  ideal.  This  prompts: 

2.2  DEFINITION  A  set  T  is  a  Groebner  basis  if  it  is  a  Groebner  basis  for  the  ideal  it 
generates. 

The  previous  univariate  example  with  S  =  { X.  X  +  1  }  is  an  example  of  a  set  which  is 
not  a  Groebner  basis.  In  the  univariate  polynomial  ring,  a  set  is  a  Groebner  basis  if  and 
only  if  it  contains  a  principal  generator  for  the  ideal  it  generates.  In  fact,  for  polynomial 
rings  in  any  number  of  variables,  a  subset  of  a  principal  ideal  is  a  Groebner  basis  for  the 
ideal  if  and  only  if  the  subset  contains  a  principal  generator  for  the  ideal.  Thus  a 
singleton  set  is  always  a  Groebner  basis. 

A  fundamental  application  of  Buchberger  theory  uses  reduction  for  an  ideal  membership 
test. 

2.3  THEOREM  Let  S  be  a  Groebner  basis  for  an  ideal  I  and  let  a  be  an  element  of 
the  polynomial  ring.  The  following  are  equivalent: 

a  lies  in  I. 

a  has  a  reduction  over  S  with  final  reductum  zero. 

All  reductions  of  a  over  S  have  final  reductum  zero. 

COMPLETE  REDUCTION  In  the  reduction  process,  only  the  lead  term  of  the  f#’s 
gets  reduced  by  elements  of  S  .  In  the  complete  reduction  process,  all  terms  of  the 
f#’s  get  reduced  by  elements  of  S  .  The  process  halts  with  an  fj  which  is  either  zero 
or  where  no  terms  of  fj  (have  monomials  which)  are  divisible  by  the  lead  monomial  of 
an  element  of  S  ,  The  complete  reduction  process  always  halts.  When  doing  complete 
reduction  of  f  over  S  ,  the  final  reductum  may  be  referred  to  as  a  complete  reduction 
of  f  over  S . 

2.4  THEOREM  Let  S  be  a  Groebner  basis  and  let  a  be  an  element  of  the  polyno¬ 
mial  ring.  The  complete  reduction  of  a  over  S  is  unique.  If  T  is  a  Groebner  basis 
which  generates  the  same  ideal  as  S  ,  the  complete  reduction  of  a  over  S  equals  the 
complete  reduction  of  a  over  T , 

Thus,  complete  reduction  gives  distinguished  coset  representatives.  This  allows  effec¬ 
tive  computation  in  multivariate  po^nomial  rings  modulo  an  ideal.  To  compute  modulo 
I ,  find  a  Groebner  basis  S  for  I  .i"'  Given  a  in  the  polynomial  ring,  the  distinguished 
coset  representative  for  a  modulo  I  is  the  complete  reduction  of  a  over  S  . 

Groebner  bases  (for  an  ideal)  are  not  unique.  Their  cardinality  is  not  unique  and  they 
generally  are  not  minimal  generating  sets  for  the  ideal.  There  is  a  notion  of  reduced 
Groebner  basis  which  involves  complete  reduction.  Ideals  have  unique  reduced 
Groebner  bases.  2 


^  "I  How  to  find  a  Groebner  basis  for  an  ideal  comes  later. 

2welllllll,  reduced  Groebner  bases  consisting  of  monic  polynomials  are  unique. 
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Presenting  reduction  as  the  multivariate  polynomial  analog  of  univariate  poiynomiai 
division  has  pedagogical  merits  and  relies  on  the  following  dictionary: 

REDUCTION  POLYNOMIAL  DIVISION 

the  set  one  reduces  over  the  divisor 

the  element  being  reduced  the  dividend 

the  final  reductum  the  remainder 

comparison  by  multiplicative  order  comparison  by  degree 

The  analogy  has  limitations.  With  R  a  field,  consider  the  polynomial  ring  R[X,Y]  hav¬ 
ing  the  lexicographic  multiplicative  order  with  X  >  Y .  The  singleton  set  {  Y }  is  a 
Groebner  basis.  Consider  the  reduction  of  X  over  {Y}.  X  itself  is  the  final  reductum. 
The  size  of  the  final  reductum  is  larger  than  every  element  of  { Y } .  With  reduction,  one 
cannot  be  certain  that  the  size  of  the  final  reductum  will  be  smaller  than  elements  of  the 
set  one  reduces  over.  Translated  to  univariate  polynomial  division,  this  would  be  as  if 
the  remainder  were  not  necessarily  of  lower  degree  than  the  divisor. 


3  GROEBNER  BASIS  TEST  and  CONSTRUCTION:  in  which  it  is  reveaied  how  to 
test  if  a  given  set  S  is  a  Groebner  basis  and  if  S  is  not  a  Groebner  basis,  how  to 
enlarge  S  to  a  Groebner  basis  generating  the  same  ideal.  We  plead  guilty  to 
presenting  the  easiest  material,  the  most  leisurely.  The  pace  quickens  this  section.  The 
Groebner  basis  test  involves  a  number  of  reductions  over  S  .  S  is  a  Groebner  basis,  if 
and  only  if  ail  the  final  reductums  are  zeio.  If  one  of  the  final  reductums  is  not  zero,  S 
is  not  a  Groebner  basis.  However,  this  non-zero  final  reductum  is  an  element  to  be 
used  to  enlarge  S  to  get  closer  to  having  a  Groebner  basis. 

3.1  DEFINITION  Let  f  and  g  be  non-zero  polynomials  with  lead  monomials  Mf  and 
Mg  respectively.  Let  Mf  n  be  the  monomial  which  is  the  least  common  multiple  of  Mf 
and  Mg .  One  can  find  polynomials  F  and  G  where  fF  and  gG  each  have  lead 
monomial  Mf  n  and  fF  has  the  same  lead  coefficient  as  gG  .  fF  -  gG  is  an  S- 

poiynomial  of  the  pair  f  and  g  . 

Notice  that  the  lead  terms  of  fF  and  gG  cancel  in  the  difference  fF  -  gG  .  Thus,  the 
lead  monomial  of  an  S-polynomial  of  the  pair  f  and  g  is  always  lower  than  Mf  g  . 

F  and  G  may  each  be  chosen  as  polynomials  consisting  of  single  terms  which  are  got¬ 
ten  as  follows:  the  monomial  of  F  is  Mf  g/Mf  and  the  monomial  of  G  is  Mf  g/Mg  . 

The  coefficient  of  F  is  the  lead  coefficient  of  g  and  the  coefficient  of  G  is  the  lead 
coefficient  of  f .  More  elaborate  choices  of  F  and  G  may  lead  to  computational  op¬ 
timization  in  the  Groebner  basis  test  and  construction  processes.  Allowing  general  F 
and  G  in  the  definition  of  S-polynomials  has  advantages  for  developing  the  general 
theory.  Further  restrictions  may  always  be  placed  on  F  and  G  in  implementations. 

This  may  not  be  correct  but  it  would  make  good  sense  if  the  S  in  S-polynomial  stands 
for  syzygy.  S-polynomials  are  the  key  to  the  Groebner  basis  test: 

3.2  TEST  THEOREM  S  is  a  set  in  a  poiynomiai  ring  which  has  a  multiplicative  order. 
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The  following  are  equivalent: 

S  is  a  Groebner  basis. 

For  each  pair  of  distinct  elements  f,g  e  S  ,  there  is  an  S-polynomial  of  f  and  g 
which  has  a  reduction  over  S  with  final  reductum  zero. 

For  each  pair  of  distinct  elements  f.g  e  S  ,  all  reductions  over  S  of  S-polynomials 
of  f  and  g  have  final  reductums  equal  to  zero. 

S  is  not  assumed  to  be  finite  in  the  theorem.  When  S  is  finite,  the  theorem  yields  a 
constructive  test  whether  S  is  a  Groebner  basis.  If  S  is  a  singleton  set,  it  automatical¬ 
ly  passes  the  test  because  there  are  no  pairs  to  reduce.  As  mentioned  before,  the  test 
underlies  Groebner  basis  construction  procedures.  Here  is  one  such  procedure  which 
begins  with  a  finite  set  S  and  produces  a  finite  Groebner  basis  for  the  ideal  which  S 
generates. 

For  pairs  of  distinct  elements  f,g  e  S  form  an  S-polynomial  of  f  and  g  and 
reduce  the  S-polynomial  over  S  .  If  the  final  reductum  is  always  zero,  halt.  S  is  a 
Groebner  basis.  On  encountering  a  non-zero  final  reductum,  let  T  =  S  u  { the 
non-zero  final  reductum  ) .  Repeat  with  T  in  place  of  S  . 

Although  this  procedure  always  terminates  with  a  finite  Groebner  basis  for  the  ideal 
which  S  generates,  the  cardinality  of  the  Groebner  basis  may  be  much  larger  than  the 
cardinality  of  S .  There  are  many  optimizations  which  can  be  made.  For  example,  as 
described,  the  procedure  will  duplicate  many  computations. 


4  A  NON-IDEAL  APPLICATION  So  far,  Groebner  bases  have  been  used  in  connec¬ 
tion  with  ideals.  We  end  with  an  application  not  concerning  ideals.  “1 3  The  application  is 
the  matter  of  subalgebra  membership  determination  and  appears  in  [59].  Suppose 
tgi,  Qr  lie  in  the  polynomial  ring  R[X«|,  — ,  Xp] .  The  question  is  to  determine 
whether  f  lies  in  R[gi,-,  gr].  the  subalgebra  of  R[Xi,— ,  Xp]  generated  by 
{ gi>  gr }  ■  Ae  part  of  the  solution,  we  introduce  additional  variables.  This  is  typical 
of  Groebner  basis  applications  and  is  what  is  meant  by  construction  lines  in  the  intro¬ 
duction.  In  this  case,  we  introduce  an  additional  variable  Tj  for  each  gj . 

Let  A  be  the  polynomial  ring  R[Xi, —,  Xp.T-i, -,  Tr] .  Choose  a  multiplicative  order 
on  A  where  each  Xj  is  larger  than  all  monomials  just  involving  {Ti,-,  Tp}.  Forex- 
ample,  the  lexicographic  order  has  this  property.  Construct  a  Groebner  basis  G  which 
generates  the  same  ideal  as  {  gi -T-j,  -  .gr-Tr)  .1^  Considering  R[Xi, -,  Xp]  c  A  , 
we  may  think  of  f  as  lying  in  A .  Reduce  f  over  G  and  let  h  e  A  be  the  final  reduc¬ 
tum.  The  answer  to  the  subalgebra  membership  question  is  given  by: 

f  lies  in  R[gi ,  —,  gr)  if  and  only  if  h  lies  in  R[Ti ,  -,  Tr) . 

If  h  lies  in  R[Ti,  Tp] ,  so  that  h  =  h(Ti, -,  Tp)  then  f  =  h(gi,  gp) . 


13welllllll,  ideals  creep  in. 

I^ip  Groebner  basis  applications,  the  additional  variables  are  often  referred  to  as  tag 
variables  when  they  are  used  to  tag  other  elements  in  the  problem. 


706 


Groebner  Bases 


5  WHERE  THE  ACTION  ISN’T  North  American  academic  mathematics  departments! 
As  a  whole,  North  American  algebraists,  in  academic  mathematics  departments,  appear 
to  have  computer  anxiety  or  computation  anxiety.  Conferences  and  bibliographies, 
which  pertain  to  computational  and  computer  algebra,  have  a  remarkably  low  fraction  of 
contributors  from  North  American  academic  mathematics  departments.  In  their  place 
one  finds  algebraists  from  Europe,  from  industry  and  from  computer  science  and  other 
academic  departments.  North  American  algebraists,  in  academic  mathematics  depart¬ 
ments,  are  appallingly  ignorant  of  even  the  most  elementary,  yet  relevant,  developments 
in  computational  algebra. 
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ABSTRACT 

This  paper  will  intoduce  a  generalized  harmonic  balance 
method  and  illustrate  the  use  of  symbolic  computing  to  solve  a  class 
of  nonlinear  vibration  problems.  The  program  package  MACSYMA  is  used 
in  this  demonstration. 

First,  a  forced  vibration  problem  with  several  different  type 
of  nonlinearities  is  given.  An  outline  of  the  method  will  be  described 
next.  This  will  be  followed  by  symbolic  computing  statements  and 
programs  which  will  crank  out  asymptotic  solutions  in  routine  manner. 
Solutions  for  a  subharmonic  and  a  superharmonic  will  be  given.  Thus 
the  easy  of  obtaining  results  of  these  otherwise  extremely  complicated 
problems  will  be  shown. 


713 


1. 


INTRODUCTION 


Since  the  introduction  of  symbolic  computation  as  a  tool  for 
mathematical  analysis,  it  has  been  increasingly  used  for  solutions 
of  problems  where  laborious  and  repetitious  mathematical 
manipulations  are  required.  In  particular,  it  is  found  extremely 
helpful  for  solutions  of  nonlinear  differential  equations  in 
conjunction  with  various  perturbation  methods[l].  In  this  paper, 
we  will  introduce  a  generalized  harmonic  balance  methods  for 
solutions  of  a  class  of  nonlinear  vibration  problems  and 
demonstrate  the  use  of  MACSYMA,  a  very  powerful  and  popular 
symbolic  computation  software  package  to  obtain  these  solutions  in 
a  routine  manner.  We  will  consider  the  vibrational  response  of  a 
nonlinear  single  degree-of-f reedom  system  with  quadratic  and  cubic 
nonlinearities  governed  by  the  equation 


d^u/dt^+u+2e//(du/dt  )  +  €a2U^+e^a2U^+ea^  ( du/dt )  ^ 

+e^agU(du/dt)^=2fcos( 2t )  (1) 

where  Q-2+e<j.  This  same  equation  has  been  considered  by  Nayfeh[2], 
using  the  method  of  multiple  scaling.  Our  objective  is  to  obtain 
his  key  results  by  a  method  that  avoids  steps  of  the  elimination 
of  secular  terms,  repeated  solutions  of  intermidiate  differential 
equations  and  Nayfeh's  reconstitution  method  and  thus  to 
demonstrate  that  the  multiple  scaling  results  can  derived  from  our 
solution  approach.  Although  we  solve  only  this  specific  example. 
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the  proposed  method  is  quite  general  and  can  be  used  for  any 
problem  where  the  nonlinearities  are  polynomials  in  u  and  du/dt, 
e.g.,  see  Nayfeh  and  Mook  [2],  and  Nayfeh  [3]  and  [4]. 

In  both  Nayfeh  [3]  and  Nayfeh  and  Mook[2],  Sect.  2.3.4,  it  is 
shown  that  the  method  of  harmonic  balance  can  lead  to  erroneous 
results  if  applied  simply  in  a  routine  fashion.  Quoting  from  p.61 

of  this  later  reference;  " . to  obtain  a  consistent  solution  by 

using  the  method  of  harmonic  balance  one  needs  either  to  know  a 
great  deal  about  the  solution  a  priori  or  to  carry  out  enough 
terms  in  the  solution  and  check  the  order  of  the  coefficients  of 

all  the  neglected  harmonics,  .  Therefore  we  prefer  not  to  use 

this  technique."  In  this  paper,  we  avoid  both  of  these  objects  by 

using  the  beginning  steps  of  multiple  scaling  to  tell  the  form  of 

the  solution  (see  equation  (18)  at  the  end  of  Section  2),  which 

gives  us  the  a  priori  information  we  require  and  also  enables  us 
to  see  clearly  which  harmonics  has  to  be  taken  into  account  and 
which  can  be  neglected. 
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2. 


FORM  OF  SOLUTION  VIA  MULTIPLE  SCALING 


As  emphasized  already,  the  key  to  the  success  of  our  varient 
of  the  harmonic  balance  method  is  to  know  the  form  of  the 
solution,  by  which  we  mean  dependence  of  the  solution  on  the  small 
quantity  e  which  is  a  measure  of  the  nonlinearity.  We  do  this  by 
the  multiple  scaling  approach  but  without  getting  involved  in  the 
laborous  tasks  of  supression  of  the  secular  terms,  obtaining  the 
explicit  solutions  of  the  intermediate  differential  equations  and 
reconstitution  of  the  final  solution. 

To  illustrate  the  point,  the  multiple  scale  method  as  applied 
to  (1)  assumed  that  (cf.  Nayfeh  13),  p.461): 


where 


U(t,6)“Ug(TQ,Tj^,  .  .  .  •  )+EUj^(Tg,Tj^,  .  .  .  .  ) 
2 

+  E  U2(Tq,T2^,....)*^.... 

T^«e  t ,  n»0 ,1,2,.... 


(2) 

(3) 


Using  Nahfey's  notation,  D^*d/dT^,  one  has 

2 

d/dt  «  Dg  +  eD^  +  e  D2  +  . 

d^/dt^  =  Dg^  +  2eDgD^  +  E^(2DgD2  +  +  - 

Substituting  (2)  and  (4)  in  (1)  and  equating  coefficients 

like  powers  of  e,  one  obtains  (Nayfeh  [3],  p.466,  (32)-(34)): 

2 

Dg  Ug+Ug«2fCOS( 2Tg ) 

-  -2Do°l'"0  -2^DgUg  -a2Ug2  -a4(DgUg)^ 


(4) 

of 

(5) 

(6) 
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(7) 


-2^D^UQ-2a2UoU^  -“3U0 

2 

-2a^DQUQ  (  Dj^Uq+DqUj^  )  -  agUQlDQUg) 

Equations  (5)-(7)  are  obtained  easily  using  the  basic  steps 
of  the  multiple  scaling  represented  by  (2)  and  (3),  which  are  the 
only  part  of  the  multiple  scaling  used  in  the  present  approach. 
Yet,  (5)-(7)  are  significant  because  they  provide  us  the  form  of 
the  solution  desired  as  will  be  described  below. 


In  order  to  save  labor  from  carrying  unnecessary  terms  (and 
there  are  many  of  them),  one  must  keep  track  on  the  relative  order 
of  various  terms.  We  begin  by  noting  that  Uj^  (  k=0 , 1 , 2 ,  .  .  .  .  )  in  (2) 
are  of  order  unity,  or  0(1). 


Taking  the  case  of  subharmonic  response  for  an  example, 

53  =  2  +  Eff  ( 8 ) 

where  a  is  of  0(1)  and  is  called  the  detuning  parameter.  Eqn.  (5) 


can  now  be  written  as 


where 


2  2 
Dq  Uq  +  Uq  =  fSA  +  c.c. 

S  -  A  -  e“ 


(10) 


and  c.c.  stands  for  the  complex  conjugate. 

It  is  important  to  note  that  S  is  a  slow  varying  function 
compared  with  A  in  the  sense  that  while  dA/dt  is  of  0(1),  dS/dt  is 
of  0(e). 


It  is  easily  observed,  from  (9),  that 


^0  *  ^01^'^1''^2' - ^02^'^1'^2' 


i  T 

)e^^»  +  c.c, 
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(11) 


Uq" 


)A  +  PQ2<et,  e^t, 


,  .  .  )A  +  C  .  C  . 

V  V  .k  V  ^ 

with 

A  -  (12) 

and  that  and  Pq2  some  functions  of  t,  the  specific  forms 

of  which  are  not  the  concern  here.  However,  it  is  important  to 
note  that  Pqj^  and  Pq2  are  slow  varying  functions  in  t  compared 
with  A  in  the  sense  that  while  dA/dt  is  of  0(1),  the  derivatives 


of  Pqj^  and  P02  are  of  0(e).  We  shall  also  use  the  fact  that 


A  -  e“^^,  and  A  A  «  1  (13) 

Next  we  substitute  (11)  into  the  right  hand  side  of  (6) 

If 

resulting  a  polynomial  in  A  ,  kaO,l,2,3,4.  Hence,  it  is  easily 
observed  that  the  solution  of  (6)  can  be  written  as 

“1  -  '■lO 

where,  again,  Pj^,  )c“0,l,2,3  and  4,  are  slow  varying  functions  in 
t  compared  with  A.  Now,  substituting  (11)  and  (14)  into  (7)  and 
going  through  a  similar  process  as  before,  one  can  write  easily 

^2  "  ^20  ^^21^  **22^^  ^23^^  ^24^** 

+  P25A  +  P25A®  +  c.c.)  (15) 


Once  again,  P2j^  ( lt»0 , 1 ,....,  6  )  are  slow  varying  compared  with 

Pi  • 

Hence  to  obtain  the  form  of  the  solution  u,  one  substitutes 
(11),  (14)  and  (15)  in  (2)  and  collects  terms  of  same  powers  in  A. 
The  result  is 

u  -  eUQ+[(Uj^A  +  U2A^)  +  e(U3A^+U4A^)+e^(U5A^+UgA^)+c.c. ]  (16) 
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where 


+ 

o 

®^20' 

^1  “  ^01-^ 

^^ll"^  ®  ^21 

02'*' 

U3  =  Pi3+ 

^^23 

(17) 

14-^ 

*^^24 ' 

^5  =  ^25' 

^6  '  ^26 

It  is  then  also  clear  that  Uj^  (  k=0 , 1 , 2 ,  .  .  . ,  6  )  are  of  0(1)  and 
slow  varying  compared  with  A.  The  approximate  solution  (16)  of  u 
is  good  to  the  order  of  c  .  To  obtain  a  solution  of  u  good  to  the 
order  of  e,  we  shall  drop  Ug  and  Ug  terms  so  that  the  final  form 
of  the  solution  to  be  used  in  this  paper  is 

u  *  eUg+t  (Uj^A+U2A^)  +  e(U3A^+U4A^)+c.c.  )  (18) 

In  the  next  two  sections,  we  shall  derive  expressions  of  Uk's  with 
the  help  of  the  MACSYMA  program  for  the  subharmonic  and 
superharmonic  vibrations. 
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3.  THE  TRANSIENT  SUBHARMONIC  CASE 


If  one  wishes  to  obtain  the  first  two-term  approximation, 
u=UQ+eUj^,  in  the  solution  of  (1)  as  a  power  series  in  e,  the 

procedure  is  then,  to  substitute  (18)  in  (1)  and  set  to  zero  the 
coefficients  of  A^,  k*0,l,2,3  and  4.  First,  all  the  terms  in  (1) 

will  be  written  in  power  series  in  e,  dropping  those  of  O(e^): 

du/dt-(dU^/dt+iUj^)A+(dU2/dt+2iU2  )A^ 

+( (dU3/dt+3iU3)A^+(dU4/dt+4iU^)A^)+c.c.  (19) 

d^u/dt^-(d^U^/dt^  +  2idUj^/dt-U3^)A+(d^U2/dt^+4idU2/dt-4U2)A^ 

+e[ (6idU3/dt-9U3)A^+(8idU^/dt-16U^)A'^]+c.c.  (20) 

u^=2Uj^Uj^  +  2U2U2  +  (  2Uj^U2A+Uj^^A^+2U^U2A^+U2^A^ 

+  2e(UQUj^+U2U3  )A+c.c.  ]  (21) 

3  2 

Since  u  appears  with  a  coefficient  of  e  in  (1),  one  only  needs 

to  Iteep  terms  of  0(1)  in  the  expansion: 

u^-3Uj^2U2  +  3Uj^2U2  +  I  (  3Uj^^Uj^  +  6U^U2U2  )  A+ (  3U2^U2  +  6Uj^Uj^U2  )A^ 

+  (Uj^^  +  3Uj^U2^)A^  +  3Uj^^U2a'*  +  c.c.  ]  (22) 

Similarly,  one  has 

(du/dt)^=2Uj^Uj^+8U2U2+2i[  (Uj^  dU^^/dt-dU^^/dt  ) 

+2(U2  dU2/dt-dU2/dt  U2 ) 1 
+  [  (  4U^U2-2iUj^  dU2/dt+4idUj^/dt  U2+12eU2U3)A 
+  (-Uj^^  +  2idU^/dt  Uj^  +  6eUj^U3  +  16sU2U^  )A^ 

+  (-4Uj^U2+2idU2/dt  Uj^+4idU^/dt  U2+8 eUj^U^  ) A^ 
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(23) 


+  (-4U2^+4idU2/dt  U2-6eUj^U3  )A^+c .  c .  1 

u(d\a/dt)^»3Uj^^U2+3Uj^^U2  +  l  (U^^Ul  +  8Uj^U2U2  )A 
+  (  2Uj^Uj^U2  +  4U2^U2  ) A^-Uj^^A^-5Uj^^U2+c  .  c  .  ]  (  24  ) 

We  now  substitute  (18)  and  (20)-(24)  in  (1),  collect  terms  of  like 
power  of  A*^,  k=0,l,..,4  and  then  set  the  coefficients  to  zero.  The 
resulting  equations  are:  for  A^  coefficient, 


for  A^, 


for  A^, 
for  A^, 


€  [  Uo  +  2a2  <  (  Uj^Uj^  +  4U2U2  )  1  =0 

2i  (dUj^/dt+C/zUj^  )+2e(  a2+2a4  ) Uj^U2+d^Uj^/dt^+2  C/udU^^/dt 
+  2iea^(2U2  dU^/dt-U^^  dU2/dt) 

+  e^  [  2a2  ( Uo^l'^^3^2  >  ^  Ul^^l  +  2U^U2U2  ) 

+  12a^U2U3+a5(Uj^^Uj^+8U2^U2U2)  1=0 

-3U2  +  4idU2/dt+c[4i;t/U2+(a2-a^)Uj^^]=fS 


€[-8U3  +  2(a2-2a4)Uj^U2]=0 


and,  finally,  for  A  , 


et -15U^+( a2-a^ )U2  1=0 


(25) 


(26) 

(27) 

(28) 

(29) 


Equations  (25)-(29)  can  again  be  conveniently  put  in  a  tabulated 
form  as  in  TABLE  I. 


Since  the  end  goal  here  is  to  obtain  an  equation  which 
contains  the  information  on  the  relationship  between  the  amptitude 
and  the  frequency,  and  since  we  have  five  unknowns  and  five 
equations,  we  can  reduce  them  to  one  equation  with  a  single 
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TABLE  I.  TABULATED  EQUATIONS  INDICATING  RELATIVE  ORDER  OF  TERMS 

IN  THE  TRANSIENT,  SUBHARMONIC  CASE 


-3U, 


4idU2/dt 
+  e(  4i/yU2+(  ®2““4 


if  fc  if 


e[-8U3+2(  02-2o^  )Uj^U2] 


*  *  * 


e[ -15U^+( a2-4a^ )U2  1 


if  if  if 


1 

1 

1 

e<> 

1 

1 

el 

1 

1 

e2 

1 

|RHS 

nO  ! 

A  1 

1 

0 

1 

1 

1 

1 

1 

1 

if  it  it 

♦ 

I 

1  0 

1 

1 

1 

1 

1 

+  2a4(U^Uj^+4U2U2)  1 

1 

1 

1 

1 

! 

A  1 

1 

0 

1 

1 

1 

2i(dUj^/dt+e//U^) 

1 

1 

1 

d^Uj^/dt^+2e/t/dU3/dt 

1 

1  0 

1 

I 

1 

1 

1 

1 

1 

+  2e(a2+2a^)Uj^U2 

1 

1 

1 

+2ica^(2U2  dU^/dt 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

-Uj^  dU2/dt) 

1 

i 

I 

1 

1 

1 

1 

1 

1 

1 

1 

+2e^ta2(UQU3+U2U3) 

1 

1 

) 

1 

1 

i 

1 

1 

1 

+6a4U2U3] 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

+  e^[  3a3(Uj^Uj^^  +  2U3U2U2) 

1 

I 

1 

1 

1 

1 

1 

1 

1 

+  a5(U3Uj^^  +  8U^U2U2  ) 

1 

1 

fs 


0 

0 


Note  that,  RHS  indicates  the  right  hand  side  of  the  equation  and 
***  indicates  terms  not  needed  for  the  present  approximation. 


unknown.  Also  from  (18),  it  is  clear  that  and  U2  are  more 

significant  compared  with  Uq,  and  in  the  sense  that,  in 

2 

order  to  obtain  a  solution  of  u(t)  accurate  to  0(e  ),  while  and 

2 

U2  must  be  accurate  to  0{e  ),  Uq,  and  only  need  to  be  of 
0(e).  This  relative  significance  in  order  also  affords  us  an  easy 


way  to  solve  these  equations  by  an  iterative  procedure. 

The  most  dominant  term  of  the  solution  is  U2  in  (27), 

U2=-(l/3)fS=-(l/3)fe^*^®^  (30) 

In  terms  of  and  U2,  one  has,  to  the  first  approximation, 

Uq=-2(  a2+a4  )Uj^Uj^-2(  a2  +  4a4  )U2U2  (  31 ) 

U3=(l/4)(a2-2a4)Uj^U2  (32) 

U4«(l/15)(a2-4a4)U2^  (33) 

And  the  differential  equation  to  determine  is 

dUj^/dt*-c[;uUj^-i(a2  +  2a^)Uj^U2]  (  34) 

To  improve  the  solution  to  the  next  order  of  accuracy,  one 


2 

includes  the  0(e)  terms  in  (27)  for  U2  and  the  0(e  )  terms  in  (26) 
for  the  differential  equation  for  Hence 

U2— (l/3)fs+(l/3)e(  (a2-a4)U^^-(4/3)  (i/!y-a)fS]  (35) 

U2—  ( 1/3  )  f  S+  ( 1/3  )  e  [  (  a2-a4  )  (  4/3  )  (  i/t/+  (J  )  f  S  ]  (  36  ) 
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Also  obtaining 


d^Uj^/dt^— e[/;dUj^/dt-i(a2+2a4)  (U2  dUj^/dt+Uj^  dU2/dt)]  (37) 

Equations  (35)-(37)  can  be  substituted  in  (26)  to  obtain  an 
improved  first  order  differential  equation  for  : 

2i  (  dUj^/dt+e/zUj^ )-(  2/3  )  e(  a2  +  2a4  )  f  SUj^ 

+e^{ [-//^U^+( 2/9) f^)3a3+4a5)-( 1/18) f^(5a2^+12a2a4-12a4^) ]U^ 

+  (l/3)(9a3  +  3ag-10a2^-10a2a^-4a^^)Uj^2Uj^ 

-  (  4/9  )  i/u(  2a2+a4  )  fSUj^+(  1/9  )  a(  lla2  +  16a^  )  f  SUj^ }  =0  (  38  ) 

It  is  not  difficult  to  show  that  (38)  is  identical  to 

Nayfeh's  equation  (81)  in  [1]  by  replacing  2  with  2+ta,  Wq  with 

unity  and  with  A.  In  this  comparison,  caution  must  be  used 

2 

however,  in  obtaining  the  expanded  terms  of  0(e  )  in  the  last 

expression  in  Nayfeh's  equation  where,  specifically,  2=2+ea  and 

2  2  2 
2  *(2+ecr)  »4  +  2ec  should  be  used  for  the  parameter  A»f/(l-2  )  in 

the  expansion  before  dropping  the  ^igher  order  terms  too  early. 

The  procedure  as  described  hitherto  is  automated  via  a 

MACSYMA  program  in  the  following.  Remarks  are  contained  between 

symbols  /*  and  */  in  the  program.  It  is  noted  that  the  equation 

obtained  by  seting  the  final  expression  (D56)  to  zero  in  the 

MACSYMA  program  is  identical  to  Eq.  (18)  as  it  should  be. 


724 


01 

m 

44 

c 

E 

4*. 

•» 

L 

L 

44 

3 

D 

44 

44 

SI 

<9 

L 

& 

4^  M' 

» 

r-» 

«4 

<  V 

IT 

4-^  W 

44 

C 

N 

m  4j 

9 

N. 

3 

» 

n 

X 

t- 

3 

44 

44 

f  >1  K' 

J» 

U 

V 

» 

W4  < 

»  44 

m 

u 

44 

3 

• 

4J  4^ 

rj 

O’ 

01 

« 

3 

c 

D  <5 

'»«' 

3 

A 

<■ 

0 

••  4* 

•  •• 

44 

3 

3 

44  r-i 

C  Ji 

in 

0 

< 

4^ 

■Xi  ^•l 

0  c 

13 

u 

V 

3 

» 

■»< 

•»  w 

0 

3 

V 

3 

r- 

T3 

4^' 

44 

-4 

V 

cH 

•»« 

• 

ut 

0 

^5  ri 

»• 

C  3 

•o 

r>i 

r 

» 

U) 

W  4. 

u  m 

m 

n»  3 

4» 

a 

m 

a 

Si 

-*-»  tf 

■-»  c 

c 

+ 

3 

£ 

r 

0) 

3  •■ 

*4.  f5 

V 

6 

a 

<9 

a 

h4 

» 

2 

01 

»-«  u 

u  ^ 

-I  X 

-4  K» 

L. 

C  K 

44  II 

01 

Ol 

C 

ft  A. 

L 

u  tr 

E  C 

«9 

*4. 

z  o 

0  ^ 

pM 

<5 

hi  L. 

44  tn 

u 

a 

o 

3  - 

0 

E 

u 

<9  !i 

•D 

'#4 

c  •-« 

•a 

£ 

in 

CO 

U 

3  C 

V 

»  M 

n 

Qi 

N.  X 

K 

44  T5 

0  Oi 

€ 

44 

cn  L 

A 

«•>. 

K' 

a 

t*) 

U  4* 

*  £ 

U 

u 

>.» 

V  C 

*9  • 

> 

«* 

5» 

-I  »o  «r 

C  ^ 

.#4 

A 

»9 

X  «•  u» 

0  44 

4J 

44 

E 

*'ji 

0  3 

-  in 

rtj 

^9 

» 

CA  4- 

44  • 

> 

» 

•a4 

•a* 

a  ^  Ki 

u  m 

'*4 

.«•< 

in 

X 

a  3  ''' 

c  • 

L 

.V 

<6  4  »5 

3  44 

0i 

1 

.a4 

m  4» 

**  <0 

*D  # 

« 

X 

44 

4»  r-i 

* 

44 

0  r-»  f/) 

01  le 

0  13 

44 

44 

•*■3  S 
Cl  •  S  . 

0  3  D  jj 


U  U  3 

w  ^3 


OD  0-  O 

—  —  r-i 

U  U  U 


GODFnrilER:  .  j  jw..r  m.icl .  1  i  £p.  S  lO/ll/BQ  17;33:4J  Page 


C) 


£ 


A 

01  n 

0k 

•1  01 

a 

A 

i: 

a 

tfi  a. 

S 

a  Ol 

A 

'««' 

e  0.: 

e 

tj 

Ol  in 

», 

n 

c 

44  0. 

Q 

C* 

«  n 

-3 

4-t 

ON 

a 

II  IJ 

•i 

44  Q.  0k 

<• 

K« 

•  01 

i;  e  r-» 

£ 

■  •4  M 

.  ill  - 

•> 

* 

<  m* 

44 

<1  a 

U  1 

X 

0, 

W  0  u 

£  01 

c 

a  r-i  .00 

U  • 

•• 

e  e  N 

m  o 

»■» 

Ql  w  •■ 

•3 

«L 

3  •• 

Ol 

44  3  44 

C 

^  T3 

3  II  IP 

1.  44 

c 

1}  nt  ;. 

•» 

N. 

a 

0  44 

‘•4 

^  ^  Ul 

«■ 

H-  VS 

* 

K* 

4!^  UJ  » 

11 

c 

V4  Q.  44 

• 

** 

L 

•  M 

■  •4 

0k 

44  e  <0 

, 

3 

aN 

fM 

3  3  P 

L 

s  IF 

•  a 

44 

Ol  0M 

•>4 

04 

^  44  H 

n 

0i>  m 

0 

11 

Z  M 

a 

0k 

030. 

a  1^ 

5}  ^ 

k 

;< 

04 

0  m 

'■'  .0* 

44 

.  #  ftS 

•> 

01 

0. 

w  m  ^ 

Ul 

15 

s  0  • 

01 

li' 

1 

Z! 

<»«. 

0  •• 

i»i» 

0k 

m 

a  r*  tj 

* 

<z  ^  s  *> 

IP 

U) 

o 

44 

0. 

A 

.■*. 

0-  E  L4  44 

0k 

II* 

S  11  4.' 

■*< 

4N 

L  «-« 

L 

«>. 

44 

11  m  3  3 

IP 

■  X  Ul 

IS'' 

u 

1 

11 

01  *• 

13 

44 

0t 

k.* 

• 

IP  44  li  ;i 

c 

<  .*4 

««-  0  3 

««. 

Z  r-. 

a 

0, 

n 

0- 

01 

C  3  r-*  •- 

1: 

0.  0 

0  a  3  0 

44 

44 

44 

0  ^ 

•• 

•n 

4j 

3 

Zi 

0 

A 

3 

fl  I’  Kf  C 

a  01 

44 

^  L  a 

44 

■4»' 

a*^ 

«■ 

•-» 

01 

0> 

0) 

o 

A 

44 

0  r"  u  4»i 

> 

3  •w 

IP 

L  ^  ^  ^ 

+ 

4. 

+ 

• 

4J 

Ul 

44 

0M 

04 

0-» 

A 

?0 

an  as 

ll 

'*"  n 

IJ  —  V  £ 

•4 

Cf  fO 

3 

3 

3 

3 

0.' 

3 

3 

■0K 

3* 

E  >-i  s  • 

■0« 

s  a  M 

-4 

G 

L 

* 

• 

3 

0  44  t»;  i-» 

ji 

£  ^ 

U 

a  11  ii 

c 

Ci 

K* 

'J’  U 

Tl  44 

44 

44 

•  44 

44 

44 

44 

V 

04  3  w 

fz 

3  <r 

3 

*  ^  z 

w 

•r 

J» 

0 

•kri 

0k 

44  Uk 

£ 

Ol  3  0 

£ 

6  O 

ll 

Cl 

a  6 

t-4 

c 

u 

L  N  1-1  3 

0., 

44 

» 

44 

E  4J  H-  a 

c 

c 

c 

c 

c 

44  ^ 

u 

01 

01  ^ 

044 

014 

0M 

04 

a  u  il 

n 

3  3 

(V  -  •  1i 

4l 

Ol 

Ul 

o: 

Cl 

■••«  ^ 

3  S 

L 

•k  «• 

•0 

•0 

00 

X 

G  a  w  ^ 

F 

£  Ul 

m  e  c  •> 

£ 

£ 

£ 

£ 

£ 

X  •> 

k.  0 

c 

31 

o 

w 

r4  ^■■• 

3 

s  E  s  n 

0 

0 

0 

44 

44 

44 

c 

£  L 

Ol 

3 

44  ^ 

0M 

04f 

^4 

04 

44  3  i! 

u 

L 

£  ^ 

i!\  ^ 

44  V 

a 

•0 

Q 

00  44  3 

U.  .00 

0 

IP 

44  ^  • 

r4 

r? 

<• 

G  • 

> 

> 

z  r 

c 

r 

c 

r 

£ 

4  3  r-^  -k 

0  1.1 

44 

•• 

k.  X 

H  -n 

■  04 

.04 

3  3 

Qt 

0^ 

3 

0«4 

0 

a  li 

ul 

1-4 

u 

0 

S  K'  r'  CL 

II 

il 

i; 

tl 

01  01 

L 

L 

a  £ 

£ 

£ 

m 

LI 

s  a  r-^ 

3 

i-i  K* 

& 

Tl 

k.  k. 

3 

01 

-04  .,.• 

44 

■04 

Ol  N  E  u. 

SS 

> 

44 

*  .«  K 

k. 

w 

L 

k. 

L 

c 

0  0 

*3 

T) 

# 

3 

«4  U4  0)  44 

Ul  44 

IC 

s  a  r-4 '' 

•J 

01 

li 

11 

01 

«4W  «*, 

• 

c 

«4 

Cl 

r:« 

<■ 

0- 

44 

£ 

w  3 

> 

n  Qj 

c 

w  -  -  ^ 

z 

z 

z 

z 

z 

c  • 

•k 

X 

y 

X 

a 

44 

3 

^  5" 

E 

cr 

Oi  ^  l. 

w 

J 

0 

0 

0 

.*4  w4 

3 

Oi 

3  i» 

n 

K 

1- 

1: 

Ul 

u;  0 

•s 

3 

44  ^  ^  ?3 

a 

a 

a 

a 

a 

44  44 

•«4 

*3 

*3 

c 

L 

O' 

•i;  r-i  r-« 

iP 

tj 

-•  -4  a 

*0 

15 

■4 

;.)  «k 

a 

C 

c 

C  :-: 

X 

X 

X 

-04 

c 

(J  ^  k4 

C 

■  00 

c 

c 

••  ••  •• 

01  w 

X 

■  04 

■  04 

.0«»  i. 

3 

4} 

3 

> 

■  H 

0  u  Ul  3 

u 

un  Ul 

u 

S3*  C  O  •*-• 

C  —  —  4.' 

.•4 

.#4 

.•N 

.<4 

^  X 

:c 

X 

X  c 

c 

c 

X' 

a 

,3  E  1!  ^ 

0,. 

C  T'l 

3 

IP 

C  Ol 

01 

3 

»■ 

U  .0. 

•04 

-04 

■  04 

.04 

IP 

3 

li  3 

Gl  Ul 

T3 

w  LJ 

G 

■D 

*0 

.04 

E 

c* 

»•  ^  04  u* 

F 

•-•  V 

■0.  ^ 

r 

C  41. 

k. 

C 

c 

c 

^40 

04 

ia 

<-0  3  Ul  ,0! 

3 

U  3 

0!  i-i 

C 

rr 

Or  j 

W  ^ 

Ol 

■04 

.04 

■  W*  ^04 

-*<4 

•  04 

•  04 

•04 

1 

01 

L 

a  n  a  3 

44 

--  0 

C‘ 

C 

0 

•■4  C 

0  w 

k. 

44 

0 

•*  n  B  ?: 

04 

«u  «k 

•D  • 

0, 

M-  C 

D 

k. 

0  —  I:  r- 

0 

u  m 

c  0 

- 

c- 

0 

«■ 

0 

Z 

0) 

•0’'  UJ  4i0  04 

a 

0  .04 

15  » 

3 

0k 

■J  !■ 

fC  ^ 

G 

r 

0 

m5 

E  -kJ  3  >-■ 

X 

C  u« 

£  r. 

•• 

^  1! 

a 

L 

04 

L  3  w  3 

3 

L'  ^ 

«■ 

3 

N 

W 

3  •• 

•• 

0 

0  Uf  ••  Ul 

04 

44  *4* 

Ul 

0  a 

6  ^ 

44 

It 

E 

H-  ••  L  •• 

> 

U’  W 

£  «» 

•  00 

3 

u  a 

01 

•• 

L 

Ul  a  a  a 

& 

UP  44 

c*ri 

*4‘ 

0k 

vJ 

C  X 

01 

0 

C  E  E  E 

u  Cl 

.00  < 

Cl  0 

U.  0: 

L 

•  04 

u. 

lO  ^  Ol  0 

01 

ul 

a  It 

0 

0k 

0. 

w 

L  44  44  u 

C 

Ul 

c- 

e 

z 

44 

44  04  00  04 

U 

il 

,‘“1 

kJ 

6  -- 

*  L 

0 

C 

» 

1 

4^  *0 

0 

4» 

•• 

a.  ^ 

^  0 

a 

01 

s. 

V 

V  r\ 

C’ 

s. 

04 

0  1 

44 

Ol 

.m 

L4 

*0 

L.  w 

L 

04, 

04 

00, 

•*.  Ul 

3 

0) 

-4  ai 

ro 

e 

a 

\r.  44 

0  04 

N 

Cl 

z 

r>4  *0 

ro 

f4 

a  01 

r-i  m 

M 

a 

c 

U  L 

L? 

(j 

u 

0 

CJ  -01 

U  £ 

U 

3 

a 

^  0 

J 

— 

k. 

*0 

726 


(Cra)  cJtpencl2!  CuCOl ,  di  f  f  (ut  1  ]  ,  t )  ,  iiC  3D  ,  uC4  3  3* 


GODFATHEF'':  )  jwLr>m.(t.  1  .  1  t  sp. 


r 


01 

t_’ 

•D 

D 

44 

■0* 

00 

c 

44 

■*.' 

ti  o 

>.  C 

4J 

1 

LJ 

0 

to 

•W 

«  0 

• 

U'i 

3 

V  ^ 

; 

ID 

• 

* 

.00 

k 

1 

4J  *-« 

.0* 

(N 

.114 

•s, 

'w' 

■- 

44 

C 

c 

0 

44 

Qt 

c 

4J 

< 

» 

£ 

to  ti 

in 

0 

0 

(0 

44 

N. 

U) 

C  0 

■  ftt 

k. 

L 

ID 

k 

u 

44 

k 

.p4 

«■ 

L 

•  #4 

C  L 

C 

01 

0 

C 

•  »4 

00 

D 

44 

Oi 

CJ 

1 

.*4 

4>/ 

G  <r 

'G 

• 

r<«  G 

TJ 

in 

m 

k 

44 

.0J 

n  ^5 

-H  V 

L 

C 

>w  r  i 

31 

LJ  a 

a. 

1 

Oi 

•r4 

m 

• 

<■  L 

4J  .H 

0 

0 

•»  < 

C 

W  Si 

c 

c 

£ 

V 

*4 

k 

c 

f  4 

£ 

w  Oi 

1?  1- 

1 

■«< 

V 

0 

0 

k 

44 

.M 

■D 

.#4 

c 

D  4J 

•-* 

”0 

4J 

« 

k 

•«4 

•0* 

0 

.p4 

1 

C 

*0 

M 

n 

<• 

0 

•0* 

11  IP 

< 

G 

>0 

.  n 

Ti 

44 

j4  44 

44 

*0 

D 

■D  f 

W  iw 

0 

L 

C 

3 

1 

"0  in 

to 

n 

n 

"C 

c 

U  44 

V 

44 

liT 

in 

C  4J 

.«4  -M 

a 

1.1 

01 

0 

> 

:< 

N.  .ft 

L 

k 

c 

c 

0 

at  * 

■  00 

mi 

G 

1?  10 

0 

•00  ^ 

a< 

r~l 

0 

0 

to 

0 

u 

in  z 

*3 

V 

Qi 

*■ 

Ti 

w 

TJ 

•  *4 

4j  n 

0ml 

-•  k 

44 

44 

L 

u 

Ol 

0 

c 

c 

4-= 

L 

rn  .4 

C  ^ 

•» 

3 

il 

a 

LJ  Oi 

■0-. 

V 

01 

11 

Ol 

0 

0 

4-) 

f;  ^ 

0  C4 

£ 

— 

4J 

k  + 

m 

£ 

3  T> 

; 

1 

i 

in 

» 

£  44 

u 

u 

£ 

1 

W  A 

U  »-' 

L 

r-^ 

m 

CJ  ^ 

n 

0 

Tf  k 

Tl 

rt  n 

44  Oi 

G 

a> 

•• 

C; 

Oi 

G 

w 

4J 

LJ 

J 

C 

c 

C 

£' 

Ol 

^  tr 

k 

in 

n 

’/I 

T* 

i’ 

ti 

in  v< 

4>’ 

•  «4  ^4 

G 

«• 

k  1 

c 

c 

r 

LJ  LJ 

H-  Of 

1 

1 

L 

Qi 

»  <“* 

■  H 

(L 

w 

44 

0  TD 

u 

u 

a 

44 

r“-  m 

0  44 

> 

> 

•*> 

d 

1'  ^ 

Hi 

T3 

•• 

1 

—  C 

Oi 

& 

£ 

•  00 

UJ 

ft 

E 

*c  » 

r 

01 

V'  ^ 

c  -• 

C 

C 

in 

IT. 

0 

6 

LJ  LJ 

01  i 

k 

k 

C 

*s. 

> 

4^ 

00 

c  c 

0  »-» 

0 

0 

3  U 

■«-» 

u 

0 

C  C 

0)  *0 

Qj 

Ol 

in 

.«H 

r“.  0m* 

01 

Si 

Q 

^  2 

..14 

■0* 

0  c; 

> 

> 

L 

0  0 

>  c 

T5 

*3 

£ 

•* 

0^  W 

L 

V 

4«>  .0i 

0  0 

44 

4.’ 

.0*  11 

& 

Oi 

44 

•♦• 

.0*  -0* 

-  0 

**^ 

.1-. 

4-i 

b>«  G 

G  ^ 

0  ^ 

•• 

in  ^ 

G 

05 

44  ^ 

1 

44  44 

4>  U 

> 

> 

1 

3  '**’ 

W  0 

4^ 

3  01 

44 

W 

k 

to  Ol 

c 

c 

c 

44 

to  (0 

to  Of 

D 

01 

— 

ul 

m) 

C 

^  H- 

^  W 

m 

f  »- 

CJ  *5 

Oi 

OJ 

k  > 

0 

c 

0 

Ol 

k  k 

>  in 

■0* 

> 

0 

a  'J 

C  4.' 

•0* 

O'  01 

£  V. 

44 

44 

Ut  — 

•*4 

in 

QJ  Oi 

4J 

44 

n 

.0* 

w  •-* 

G  ^ 

0«  iP 

k.  4j 

■0* 

V  1 

.«4 

•4 

0 

44 

44 

44 

44  44 

L  '*■ 

.00 

G 

c 

4-'' 

C  "C 

** 

-00  -0* 

L 

■  H 

4^ 

1 

1 

.0*  m 

<0 

Q 

■0 

c 

■  .H 

ll  ^ 

V 

f  i 

in 

c 

C 

•t-  •• 

a»  .. 

u  ^ 

it 

or  1 

« 

*t- 

44 

44 

•« 

k 

k 

0 

1  ! 

■0  H 

c 

r 

’H 

L 

ft 

O 

r.i  n 

c  ^ 

i. 

^  ffl 

7 

r  4J 

<>.1 

0  ' 

iO 

in 

•3  3 

Oi 

D 

z, 

■c  *0 

13 

0 

c 

k 

11 

w 

r 

*3 

0 

00 

„ 

Ol  V 

44 

k 

•• 

k 

3 

k' 

ti 

>  44 

c 

.00 

4» 

-4 

44 

0 

0 

Ol 

« 

3 

rr 

^  0* 

0 

»•, 

44 

1 

U 

l1 

3  1 

*0 

Ol 

Ol 

j. 

iSi  44 

c 

.M. 

in 

A 

c 

..i. 

in 

li 

m 

01 

1^. 

I 

fl 

c 

K* 

•• 

in 

«c 

k 

a 

h* 

00 

f'*' 

L> 

r* 

-4 

r- 

r 

wj 

3 

U 

3 

CJ 

w 

u 

N 

CJ 

u 

r4 

U 

N.  ^ 

C 

in 

•*' 

ui 

3 

3 

c  c 
3  0  c 
^  u  u 
0  a:  & 
tn  ifl 


13 

V 

44 

u 

Lt 

u 

M 

j 

k 

k 

a 

a 

■01 

Cl 

01 

Ol 

»  r>t 

«  Qt 

Qt 

ot 

«» 

3 

3 

N. 

1 

in 

in 

UI 

X  44 

V  m 

•3 

V 

N. 

V 

0 

1 

c 

0m, 

^  4fl 

^  > 

A 

CO 

0 

00 

(N 

K' 

^  1 

in 

'O 

N 

o 

M 

f'l 

u 

ft 

<■ 

^  k 

U 

u 

u 

Ol 

u 

C 

U 

U  0 

U  Qt 

u 

u 

u 

yi 

^  ji 

^  V 

6Pf)rATHF!F;(  'j  mvu. 


n 


h*i 


0 

o 


r> 


*u 

L 

Cl 


c  • 
0  ^ 

J\ 

-M  c 

f6  iX; 

L  • 

Cl  ^ 

4J  it 

•«t  « 

I  t 

V  » 

Vk  (P 

0  ja 


Ifl  —  t  10 

C  H-  *0  4J 

t  v  a  fli 

>  5  V  h 

Cl  »  (0 

••  in  c  c 
*n  c  t  t 
c  t  • 

t  ^  ^ 

*  >  &i  11 

Cl  ••  c 
v  •«  in  t 
-me* 
c  c 
c  t 


m 


V  u  ^  y1 

«'  01  c  * 
to  •  0  ^ 

«  >  V  4J  V 

^  .H  I  t  in 

a  L  c  i.  ^ 

ai  01  0  Cl  3 

*0  4.‘  (fl 


< 

c 


N 

< 


a 

0 


ul 


f -4 

c  «-« 

U  m 
-  f  'l 

u  ^ 

•« 

w  lO 

w 

0  •• 

u  — 

Li 

»  11 


L  t 

0  a  L 

X  Q| 

i.  ai  4^ 


3  >  i 
0  01  T) 

a  ••  c 

01  3 

c  u 

0  t  0) 

•  in 

L  • 

11  in  to 

tJ  c  c 

!,  t  t 
0  1-1 

> 

C  ^  01 

--  'J  .. 

o  m 
L  ^  c 
0  -c  t 

4^  il 

u  •• 

Oi  ^ 

>  L 

a 

2 

0  01 
L  ^  • 

L 

S  II 
L  *3 
C  L. 

U.  0 


I  t 

V  L 

m  •• 

L  m 

-  c 

•*.  t 

0) 

4-' 

t  «f« 
*0  ^ 
a-t 
3  T3 

m  41 

c  c 

c  t 

>  > 

11  It 


3 


m  Li 

z  c 

t  t 


^  L 
O  *3 
^  t 
U  3 
^  a 


0 

T1 

L 

0 


4^ 

t 

L 

r 

t 


u 

01 

£ 

u 


c  ^ 
tn 
u 


< 

c 


Ci 

r 


r-i 

r 


m  r4 

0  Ci 


a 

E 


TJ 

C 

t 

in 

£ 

L 

Cl 


C'i 

< 


a 

01 


C 


^  0 


r 

a 


O’ 

zt 


c 

II 

01  # 

*4- 

.<4  W 

*0  ifc 

u 

»-  OJ 

t  r 
c  u 

.P4  4^ 


CO 


0 

r 


CM 


u. 

Ul 


N 

<r 


r  !  C" 

a  I 

c  i 


0 


CO 


D 

r 


CL 

u; 


fj* 

c 


Cl 

C 


C 

r 

D 

pm 

U) 

in 


r; 


ui 


c 


a 

-1 

c 


v, 

r» 


a 

E 


t 

U 


K* 

iTt 

& 

i  f '* 

r 

c 

.‘0 

a 

;< 

Oi 

V 

IQ 

W 


C 

r 

o 

w 

in 

if: 

IL 


CL 

UJ 

c 

a 

_) 

c 


N 

1 

IL 

\ 

j 

CM 

CM 

a 

Cl 

i 

Ui 

Ut 

i 

< 

CM 

N 

i 

<r 

•r 

1 

c 

c 

i  K« 

1 

£ 

£ 

1 

i 

c 

1 

1 

I 

1- 

3 

1 

3 

CJ 

1 

1 

CM 

3 

+ 

rj 

1 

Cl 

LJ 

O' 

Cl 

-J 

C 

D  ' 
C 


D 

C 


L- 

a 

LJ 

CM 

<r 

£ 

_j 

c 


D 

f-l 


CO 


Cl 

u: 


c 


Ki 


a  •-< 

n 

a  fi 

r« 

h- 

J 

U 

jj 

4J 

4-t 

44 

4^ 

3 

1 

c 

It 

r1 

in 

in 

m 

r 

«*  .H 

11 

«  01 

01 

0i 

M3 

} 

44 

u 

S. 

•x  4./ 

4.t 

V 

44 

1 

H* 

r« 

K» 

3 

CM  4J 

K.' 

•r  -w 

tn 

o 

M3 

C* 

\n 

\r.  m 

to 

to  0 

kO 

VO 

ir 

u 

C  li 

u 

U  01 

u 

u 

Q 

s-  4J 

w 

CM 


a 

Ui 

uo 

5 

£ 

c 


CM 

0 


Ou 

CM 

Ll 

tit 

44 

CM 

a 

3 

UJ 

CM 

r-i 

CM 

c 

z 

a 

CM 

—1 

c 

3 

CM 

kO 

C" 

a 

u 

CM 

3  "* 

U. 

1  O 


CM 


Q. 

Li 

K* 

<r 

I 


0 

CM 


i  K» 


CM 


a 

u 


c 


a 

c 


CM 

C 


c 


3 

N 


'O  I  Tl 


CM 

+ 


a 

UJ 


CM 

C 

I 

a 


CM  — 

3 


K' 


N 


UL 

UJ 

K« 

C 

I 

Cl 

J 

C 


CM 

3 


a 

Ui 

c 

£ 

-I 

c 

CM 

C 

£ 

J 

c 


3 

CM 


( 


N 

m 

u 


728 


4.  THE  SUPERHARMONIC  CASE 


Next,  we  will  demonstrate  the  technique  in  obtaining  the 
solution  for  a  superharmonic  case,  where  the  frequency  of  the 
forcing  function  S  is  close  to  one-half  of  the  natural  frequency 
of  the  system.  That  is 

Q  *  (1/2)  +  e<T,  or,  2S  =  l+2ea  (39) 

By  using  (39)  instead  of  (8)  in  Sectioin  2  of  this  paper,  it  is 
clear  that  all  the  steps  remain  unaltered  except  that  S  and  A  in 

(10)  will  be  replaced  by 

S  =  and  A  =  (40) 

Thus  we  can  conclude  that  (18)  is  still  the  form  of  solution  of 

(1)  for  the  superharmonic  case  with  A  given  by  (40).  Now  the  tas)c 
is  simple.  Substituting  (18)  in  (1)  with  proper  consideration  of  A 

in  (40),  one  arrives  at  a  similar  set  of  equations  and  can  be 
solved  as  before.  Here,  we  shall  use  the  following  simple  case  of 

(1)  : 

d^u/dt^+u+2eA(du/dt)+ca2U^=fe^®^+c.c.  ( 41 ) 


or. 


2  2  2 
d  u/dt  +u+2e^(du/dt )+€a2U 


f SA+c . c . 


(42) 


Substituting  (18)  in  (42)  the  following  Table  (Table  II)  of 
equations  similar  to  Table  I  is  obtained. 


The  final  equation  in  U2  from  solving  the  above  set  of  equations 
iteratively  as  before  is  the  following: 


2i  ( dU2/dt+e/uU2  )  +  ( 16/9  )  ea2f 
+  {  [-^^-(  64/9  )  (  23/15  )  a2^f  ^  1 U2- ( 10/3  ) 
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(43) 


+  (8/27)(5a-13i//)a2f^S^}  -  0 

Again,  it  is  easily  shown  that  (43)  is  identical  to  the  result 
obtained  by  Nayfeh  using  the  method  of  multiple  scaling  (equation 
(2.46)  in  [4]  )  . 
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TABLE  II.  TABULATED  EQUATIONS  INDICATING  RELATIVE  ORDER  OF  TERMS 

IN  THE  TRANSIENT,  SUPERHARMONIC  CASE 


1  1 

0 

1 

1  1 

e 

e 

0 

1  A  1 

0 

!  ! 

1  A  1 

1  1 

3U^/4 

i  (dUj^/dt+C)uUj^ ) 

1  I 

1  1 

+2ea2U^U2 

I  a2  1 

1 

0 

2idU2/dt 

1  1 

1  1 

2 

+  e  ( 2ifjU2+0'2^i  ) 

!  *3  ! 

1  A  1 

0 

s(-5U3/4+2a2U^U2) 

1  4  1 

1  A^  1 

1  1 

1  1 

0 

e(-3U4+a2U2^) 

Note  that,  as  in  TABLE  I,  RHS  indicates  the  right  hand  side  of 
the  equation  and  ***  indicates  terms  not  needed  for  the  present 
approximation . 
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Abstract 

A  compact  symmetric  FFT  algorithm  for  real  and  even  data  is  implemented  on 
a  shared  memory  parallel  processing  computer.  The  parallel  implementation  is 
complicated  by  the  vmeven  distribution  of  work  induced  by  splitting  symmetric 
sequences.  A  performance  model  is  developed  to  predict  the  amount  of  speed-up 
that  may  be  expected  as  the  number  of  processors  is  increased.  Factors  included  in 
the  model  are  the  arithmetic  operations,  calls  to  the  transcendental  libraries,  and 
overhead  for  the  fork-join  operations.  Actual  processing  times  are  given  for  the  real 
and  even  FFT.  For  fixed  N,  speed-up  curves  are  shown  for  increasing  numbers  of 
processors,  and  are  compared  to  the  theoretical  curves  of  the  performance  models. 
While  the  speed-up  is  excellent  for  long  sequences,  for  short  sequences  the  speed-up 
peaks  at  some  intermediate  number  of  processors. 

Introduction 

Since  its  introduction  in  1965  [l],  the  Fast  Fourier  Transform  (FFT)  has 
become  one  of  the  most  widely  used  algorithms  of  computational  mathematics.  Its 
enormovis  popularity  is  due  largely  to  the  fact  that  the  FFT  requires  0{NlogN) 
arithmetic  operations  to  compute  the  transform  of  a  complex  vector  of  length  N, 
instead  of  the  0{N^)  operations  required  to  compute  the  transform  as  a  matrix- 
vector  product.  The  term  'compact  symmetric'  refers  to  a  family  of  FFT  eilgo- 
rithms  that  uses  minimal  storage  and  arithmetic  for  data  sequences  possessing  cer¬ 
tain  symmetries.  The  first  such  algorithm,  generadly  attributed  to  Edson  [2],  com¬ 
putes  the  transform  of  a  real  vector  using  half  the  storage  and  half  the  operations 
used  by  the  original  FFT.  It  has  long  been  known  that  further  savings  are  possible 
when  the  data  has  additional  symmetries,  but  with  the  exception  of  one  little- 
publicized  algorithm  by  Gentleman  [3],  such  transforms  were  performed  by  pre-  and 
post-processing  of  data  for  use  with  conventional  FFTs  [4].  In  recent  papers  by 
Swaxztrauber  [5]  and  Briggs  [6],  compact  algorithms  were  developed  for  sequences 
with  real,  even,  odd,  quarter  wave  even,  and  quarter  wave  odd  symmetries. 
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The  Cooley-Tukey  Algorithm 

Suppose  a  complex  sequence,  {i,}  =  ‘  '  *  is  given,  for  which  the 

Discrete  Fourier  Transform  is  desired.  For  convenience,  assume  that  the  length  of 
the  sequence,  N,]s  a.  power  of  two.  The  DFT  of  the  sequence  is  given  by 

N-i 

=  S  *=0,1,  -  N-l 


where  (jj^  —  c  ^  .  As  a  matrix-vector  multiplication,  the  DFT  requires  0{N^) 
operations. 

Suppose  the  sequence  {i^}  is  split  into  two  subsequences  {y„}  and  {2„},  whose 
elements  are  y„  =  I2b,  and  =  I2b+i-  Then  the  DFT  can  be  written  as 


Xk 


E  s.wj*  +  E 

B=0  ,  «=0  > 


*=0,1,  •  • 


The  two  sximmations  in  this  equation  are  themselves  DFTs,  of  the  subsequences 
{y„}  and  {?„}  respectively,  which  are  denoted  {Ft}  and  {^t}-  Therefore  the  first 
half  of  the  desired  transform  is  given  by 

At  =  Ft  +  o;*Zt  *=0,1,  •  •  •  f-l 

The  second  half  of  the  desired  transform  may  be  obtained  by  substituting  ife  -f- 
k  and  noting  the  periodicity  of  {  Ft}  and  {^t},  which  yields 

At+^  =  Ft  -  k=0,l,  ■  ■  ■  f -1 

These  two  formulas  together  make  up  the  "butterfly  relation",  or  combine  formulas. 

The  Cooley-Tukey  FFT  algorithm  proceeds  by  recursively  splitting  the  input 
vector  until  eventually  N  sequences  of  length  1  are  produced,  which  are  their  own 
DFTs.  The  butterfly  relations  may  then  be  applied  to  build  longer  transforms  from 
pairs  of  short  ones.  This  process  continues  tmtil  finally  two  transforms  of  length  ^ 

are  combined  to  form  the  length  N  transform  of  the  original  input  sequence. 


An  Algorithm  for  Real  and  Even  Data 

An  even  sequence  is  one  in  which  z„  =  A  sequence  which  is  both  real 

and  even  (E)  can  be  shown  to  have  a  transform  which  is  also  ein  E  sequence.  Sup¬ 
pose  the  transform  of  an  E  sequence  is  desired.  Following  a  process  suggested  by 
the  Cooley-Tukey  algorithm,  the  input  sequence  is  split  into  its  even  and  odd  subse¬ 
quences.  These  subsequences  are  split  in  turn,  and  this  recursive  process  followed 
until  length  one  sequences  are  produced.  With  each  splitting  the  subsequences 
inherit  certain  symmetries  from  the  parent  sequence. 
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Suppose  an  E  sequence  {a:„}  is  split  into  two  subsequences,  {y„},  consisting  of 
the  elements  with  even-numbered  indices,  and  {«„},  the  elements  whose  indices  axe 
odd.  Then 


Vn  =  *2«  = 

t 

and 

=  ®2«+l  = 

i 

Therefore  the  subsequence  {y^}  is  an  E  sequence,  while  the  subsequence  {z„}  is  real 
and  cind  has  a  new  symmetry  called  quarter  wave  even. 

A  quarter  wave  even  sequence  of  length  N]s  one  in  which  ff  the 

DFT  of  a  real  quarter  wave  even  (QE)  sequence  is  considered,  it  can  be  shown  that 

i2rk 

the  transform  has  the  symmetry  X*  =  e  ^  X**  Therefore  it  is  possible  to 
represent  both  the  real  and  imaginary  parts  of  the  sequence  element  X*  by  a  strictly 

irk 

real  quantity,  namely  X*  =  «  ^  Xf  K  a  QE  sequence  is  split  into  its  even- 

numbered  and  odd-numbered  elements,  each  of  the  resulting  subsequences  has  no 
symmetry  by  itself  (except  that  it  is  real),  but  taken  together  they  have  the  intrase¬ 
quence  symmetry 


Klt-n-l  ~  *JV-2»-2  “  ~ 

Thxis  a  real  QE  sequence  splits  into  two  strictly  real  subsequences,  one  of  which  is 
redundant. 

A  strictly  real  sequence  (R)  is  one  in  which  eeurh  element  is  its  own  complex 
conjugate.  Substituting  this  relationship  into  the  definition  for  the  DFT,  it  is  easy 
to  show  that  the  transform  of  an  R  sequence  {x„}  has  the  conjugate  symmetric  pro¬ 
perty  that  X*  =  Xjv_jfc'  Splitting  an  R  sequence  produces  two  R  sequences  which  axe 
of  half  the  length  of  the  original.  No  additional  symmetry  is  induced  by  the  split¬ 
ting  of  an  R  sequence. 

A  symmetric  algorithm,  proposed  by  Swaxztrauber,  is  schematically  illustrated 
in  Figure  1.  The  input  sequence  is  split,  producing  one  E  subsequence  and  one  QE 
sequence.  The  E  subsequence  splits  into  another  E  and  QE  pair,  while  the  QE 
sequence  splits  into  two  real  R  sequences,  one  of  which  is  redundant.  The  splitting 
process  continues,  with  each  E  generating  an  E  and  QE  pair,  each  QE  splitting  into 
an  R  (and  a  redundant  R),  and  each  R  sequence  spliting  into  two  more  R 
sequences,  until  sequences  of  length  one  are  produced  (left  of  the  bax  in  Figure  1). 
Note  that  no  work  has  yet  been  performed,  merely  that  data  has  been  moved,  and 
is  now  said  to  be  in  scrambled  order. 

Since  the  transform  of  a  length  one  sequence  is  itself,  the  short  transforms  may 
now  be  combined  into  longer  trzmsforms,  a  process  that  may  be  applied  recursively 
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imtil  the  transform  of  the  full  length  input  sequence  is  obtained.  In  order  to  do  this 
recombination,  it  is  necessary  to  have  butterfly  relations  that  combine  the  various 
symmetric  transforms  into  transforms  of  longer  symmetric  sequences.  By  substitut¬ 
ing  the  transform  symmetries  discussed  above  into  the  Cooley-Tukey  combine  for¬ 
mulas  and  simplifying,  Swarztrauber  derived  these  butterfly  relations. 

The  Combine  Formulas 

If  an  R  sequence  {z„}  has  been  split  into  its  two  subsequences,  {y„}  and  {^b}, 
the  butterfly  relations  for  constructing  the  transform,  {X*},  of  the  original  sequence 
axe 

X,=  Yt  +  i=0,l....-f 

and 

i=0,l,...f-l 

These  equations  are  called  the  'RtoR'  combine  formulas,  because  they  combine  the 
transforms  of  real  sequences  into  the  transform  of  a  read  sequence. 

It  was  shown  above  that  an  E  sequence  {z,}  splits  into  an  E  and  a  QE  subse¬ 
quence.  Since  {  i*},  the  transform  of  the  E  subsequence,  is  also  real  and  even,  and 
{Z*},  the  transform  of  the  QE  subsequence,  can  be  represented  by  the  strictly  real 
sequence  {7^},  the  butterfly  relations  for  combining  the  transforms  of  an  E  and  a 
QE  sequence  are 

A*  =  y*  +  ifc  ife=0,l,...-J- 

auid 

X  =Yk-Zk  ifc=0,l,...4-l 

JL^k  K  K  ’  ’  4 

] 

These  relations,  which  are  together  called  an  "EQE"  type  combination,  produce  the 
first  values  of  the  transform  {X*}.  Since  {.X*}  is  real  and  even  the  remaining 

VEilues  may  be  obtained  by  symmetry. 

It  has  been  noted  that  in  splitting  a  QE  sequence  {z^}  one  of  the  resulting 
subsequences  is  redundant.  Therefore  {.X*}  can  be  recovered  entirely  from  the 
transform  of  {y„},  which  can  be  represented  by  a  strictly  real  sequence  {T*},  while 
the  transform  of  {z,}  need  be  neither  computed  nor  stored.  The  butterfly  relations 
by  which  the  transform  of  a  QE  sequence  can  be  formed  from  the  transform  of  one 
of  its  R  subsequences  are 

i*  =  2Re[a/y*l  *=0,1, ...f 

and 

=  -2Im[a/nl  *=0,1, 

}  Z 
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Th\is  Y  real  values  are  required  to  represent  the  N  complex  values  of  the  transform 

of  a  real  cuid  quarter  wave  even  sequence.  This  set  of  relations  is  called  an 
"RRQE"  combine. 

The  combine  phase  of  the  algorithm  is  shown  schematically  by  the  right  side  of 
Figure  1.  In  general,  each  pass  has  one  EQE  combination,  followed  by  an  RRQE, 
followed  by  a  series  of  RtoR  combinations.  At  the  second  to  last  pass,  there  are 
only  the  RRQE  and  EQE  types,  while  the  final  pass  involves  only  the  EQE  combi¬ 
nation,  performed  on  sequences  of  half  the  length  of  the  original  input.  In  practice 
none  of  the  redundant  R  sequences  are  computed  or  stored.  The  program  uses  only 
-^-1-1  storage  locations.  The  algorithm  begins  with  the  input  data  in  scrambled 

order,  and  proceeds  through  log2iV  passes,  imtil  the  transform  coefficients  are  pro¬ 
duced  in  natural  order. 

The  Inverted  Algorithm 

It  is  generally  more  convenient  to  have  an  cilgorithm  which  operates  on  the 
input  data  in  natmal  order,  producing  the  coefficients  in  scrambled  order.  Briggs 
[5]  developed  such  algorithms  for  sequences  which  are  real,  quarter  wave  even,  cind 
quarter  wave  odd.  Following  his  lead,  an  ordered-to-scrambled  algorithm  for  an  E 
sequence  may  be  developed. 

To  derive  this  algorithm  it  is  necessary  to  formally  invert  the  Swarztrauber 
algorithm.  Since  an  E  sequence  is  itself  the  transform  of  einother  E  sequence,  the 
inverted  algorithm  can  be  thought  of  as  following  Figure  1  backwards,  from  the 
right  side  to  the  middle  bar.  Beginning  with  the  tramsform  of  an  E  sequence  in 
natural  order,  it  is  possible  to  'imcombine'  it  into  the  transforms  of  its  E  and  QE 
subsequences.  These  in  turn  axe  xmcombined  into  the  transforms  of  their  subse¬ 
quences,  and  so  on.  After  log2iV  passes  through  the  data,  length  one  transforms  are 
produced,  which  are  the  transform  coefficients,  in  scrambled  order,  of  the  original 
sequence. 

To  invert  the  Swarztrauber  algorithm,  it  is  necessary  to  formally  invert  all  of 
the  combine  formulas.  The  EQE  combine  relations  are  easy  to  invert,  and  produce 
the  uncombine  relations 

n  = 

and 

Zk  =  hXk  -  X^  J 

Inverting  the  RRQE  combine  formulas  is  a  bit  more  tedious,  leading  to  the  imcom- 
bine  relations 

R'ini  =  *=o.i.'T 
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and 


toini  = 

Inverting  the  RtoR  combine  formulas,  and  at  the  same  time  separating  real  and 
imaginary  parts  for  storage  in  a  real  array,  leads  to  the  following  four  relations: 

Re(y*i  =  hMXk]  +  Re[jr^_j) 

^  i 

Im[y*]  =  -  Im[^.  J) 

Re[Z*]  =  y  |(Re[Jr*l  -  Re[X,_  J)cos^  -  (Iin[Jr*l  +  J)sm^| 

_J)cos^| 

where  the  real  parts  of  both  {y*}  and  {Z^}  are  calculated  for  k  =  0,1,...-^  and  the 
imaginary  parts  of  both  sequences  are  calculated  for  k  =  0,1,,..-^— 1. 

The  remainder  of  this  paper  is  concerned  only  with  the  ordered-to-scrambled 
algorithm,  so  the  names  EQE,  RRQE,  and  RtoR  are  retained  for  these  uncombine 
formulas. 

Savings  from  the  Compact  Symmetric  Algorithm 

The  data  flow  zind  storage  of  the  ordered-to-scrambled  algorithm  are  illustrated 
in  Figure  2  for  a  real  even  sequence  of  length  N=32.  During  eaeh  pass  through  the 
data,  the  first  type  of  mcombine  is  an  EQE.  In  the  first  pass  this  is  the  only  type 
of  uncombine.  Beginning  with  the  second  pass,  the  RRQE  type  uncombine  follows 
the  EQE,  and  with  each  succeding  pass  there  is  one  EQE,  one  RRQE,  and  all 
remaining  uncombines  are  of  type  RtoR. 

To  analyze  the  algorithm,  let  the  passes  through  the  data  be  indexed 
j  =  0,1,  •  •  •  log2iV  -  2.  The  last  pass,  j  =  log2i^^-l,  is  considered  as  a  special  case. 
The  scalar  multiplication  by  one-half  occxirs  in  all  of  the  formulas,  £ind  may  be  per¬ 
formed  at  the  end  of  the  algorithm. 

The  backwards  running  index  (-y— A:)  is  important  in  the  EQE  imcombines. 

Because  of  it,  individual  EQE  butterflies  cannot  be  performed  in  place.  However, 
pairs  of  EQE  butterflies  can  be  performed  in  place  if  performed  together,  as  an 
EQE  "unit".  The  pass  through  the  data  requires  -^{-7)'  such  units,  beginning 

o  * 

with  pass  y=0.  On  the  pass  -^(yV  RRQE  butterflies  axe  required,  beginning 
with  pass  y=l.  Beginning  with  the  pass  j=2,  each  RtoR  sequence  requires 

RtoR  butterflies,  and  there  are  such  sequences.  The  last  pass  through  the 

data  is  considered  separately  because  in  this  pass  the  sequences  are  all  of  length  two 
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and  all  the  butterfly  types  reduce  to  a  butterfly  which  is  identical  to  the  EQE,  and 
there  are  such  butterflies. 

Noting  that  each  of  the  combine  types  requires  a  different  amount  of  work, 
and  losing  the  coimting  arguments  just  listed,  it  is  possible  to  compute  a  total  opera¬ 
tion  count  for  the  algorithm.  The  transform  of  an  E  sequence  of  length  N  using  this 
algorithm  requires  —+1  storage  locations,  and  the  total  number  of  real  arithmetic 

operations  (coxmting  multiplications  and  additions  equailly)  is  ■jN\og2N-2N. 

Performing  the  transform  of  an  E  sequence  by  placing  the  input  sequence  into 
the  real  part  of  a  complex  array  and  using  a  conventional  FFT  requires  2N  storage 
locations  and  a  real  operation  count  of  SMogjiV.  Thus  the  compact  symmetric  FFT 
requires  one  fourth  the  storage  as  its  conventionzd  counterpart,  and  requires  some¬ 
what  less  than  one  fourth  the  arithmetic.  Performing  the  saune  transform  by  tradi¬ 
tional  pre-  and  post-processing  methods  [4]  utilizes  storage  locations  and  entails 

YN]og2N+-jN  real  operations,  somewhat  greater  than  the  compact  symmetric 
transform. 

Parallel  FFTs 

Before  proceeding  to  the  problem  of  parallelizing  symmetric  sequences,  it  is 
useful  to  review  some  of  the  features  of  parallel  FFTs  for  complex  sequences.  Many 
of  the  problems  encountered  in  developing  the  parallel  symmetric  algorithms  are 
similar  to  those  that  arise  in  parallelizing  the  conventional  FFTs.  Briggs  [7] 
developed  strategies  for  implementing  FFTs  on  sheu’ed  memory  peu’allel  processors. 

The  fimdamental  work  imit  of  the  FFT  is  the  butterfly  relation.  During  every 
pass  through  the  data  each  butterfly  relation  can  be  performed  in-place  (without 
using  an  extra  storage  array)  and  independently  of  all  other  butterflies.  It  is  at  this 
level  of  the  algorithm  that  parallelization  may  occur.  The  log2N  passes  through  the 
data  must  be  performed  sequentially,  so  there  is  no  parallelism  at  a  coauser  level. 

The  two  basic  strategies  for  par2dlelizing  an  FFT  are  called  scheduling-on-pairs 
and  scheduling-on-fjj.  In  the  former  strategy,  each  processor  is  assigned  independent 
butterflies  to  perform.  Suppose  there  are  p  processors.  The  butterflies  are  passed 
out  by  giving  the  processor  the  butterfly,  and  every  p***  butterfly  thereafter. 
Prior  to  performing  the  butterfly,  the  processor  must  calculate  the  appropriate 
power  of  w.  At  the  end  of  each  pass  through  the  data,  each  processor  must  wait  in 
a  synchronization  step  until  all  other  processors  have  completed  that  pass. 

During  each  of  the  logjN  passes  through  the  data,  the  number  of  powers  of  w 
required  changes.  At  the  A***  pass,  there  are  2*“^  distinct  powers  required.  This  fact 
gives  rise  to  the  scheduling-on-a;  strategy.  If  each  processor  is  assigned  every 
butterfly,  as  suggested  above,  then  several  processors  may  have  to  compute  the 
same  powers  of  w  as  they  stride  through  the  data,  an  obvious  duplication  of  effort. 
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This  is  vmavoidable  in  the  early  passes  of  the  algorithm,  where  2*“^  <  p.  Dxiring  the 
later  passes,  where  2*”^  >  p,  each  processor  is  assigned  all  the  butterflies 
corresponding  to  a  given  set  of  powers  of  a/.  This  strategy  avoids  the  duplication  of 
effort  in  having  several  processors  compute  the  same  sets  of  exponentials. 

Briggs  [7]  implemented  both  of  these  parallization  strategies  on  the  Denelcor 
HEP  computer.  It  was  foimd  that  the  scheduling-on-w  ran  faster  than  scheduling- 
on-pairs  by  25%  (for  small  to  80%  (for  large  iV). 

The  Parallel  E  Algorithm 

The  parallel  algorithm  for  computing  the  tr«insform  of  an  E  sequence  may  now 
be  developed.  Only  the  ordered-to-scrambled  case  is  considered,  but  the  commen¬ 
tary  and  analysis  extend  readily  to  the  scrambled-to-ordered  case,  although  the 
computational  details  differ.  Further,  for  shzu’ed  memory  computers,  the  extensions 
are  immediate  to  parallel  algorithnas  for  real,  real  and  odd,  and  real  quarter  wave 
(even  or  odd)  sequences.  For  consideration  of  symmetric  FFTs  on  distributed 
memory  architectures,  see  Sweet  [8],  or  Henson  [9]. 

The  basic  assumption  regsu'ding  the  hardwan^e  is  that  the  number  of  processors 
is  small  compared  to  the  sequence  lengths  (coarse  grauned  processing),  and  that  all 
of  the  processors  share  a  common  memory.  This  assumes  that  there  are  no  explicit 
commimication  costs  in  the  algorithm.  There  will,  however,  be  some  overhead  that 
must  be  paid  for  fork-join  operations,  and  there  will  be  some  implicit  communica¬ 
tion  cost  in  the  form  of  memory  bus  contention.  Since  all  the  processors  have  equal 
access  to  all  of  the  data,  the  algorithm  is  distributed  among  the  processors. 

At  the  beginning  of  the  pass  through  the  data  (y=0,l,...log2iV-l),  copies  of 
the  subroutine  are  'forked'  to  the  processors.  The  'units'  of  type  EQE  are  then 
distributed  as  evenly  as  possible  across  the  processors,  ff  the  current  pass  is  not  the 
first,  RRQE  butterflies  are  distributed  as  evenly  as  possible  across  the  processors. 
After  the  second  pass  RtoR  butterflies  are  required.  The  number  of  RtoR  butter¬ 
flies  per  sequence  decreases  with  each  pass,  but  the  number  of  sequences  increases. 
There  are  two  cases  that  mxist  be  handled.  If  the  number  of  RtoR  butterflies  per 
sequence  is  greater  than  the  number  of  processors,  the  algorithm  distributes  the 
butterflies  as  evenly  as  possible  across  the  processors,  each  of  which  strides  through 
the  sequences  performing  its  designated  butterflies.  This  mode  is  called  scheduling- 
on-butterflies.  If,  however,  there  are  fewer  butterflies  per  sequence  than  processors, 
then  the  algorithm  distributes  the  sequences  across  the  processors  as  best  it  can, 
and  each  processor  must  compute  all  the  butterflies  for  each  of  its  sequences.  This 
mode  is  called  scheduling-on-sequences.  At  the  end  of  each  pass  the  processors  axe 
joined  in  a  synchronization  step.  If  the  current  pass  is  the  last  pass,  all  the  butter¬ 
flies  are  of  the  EQE  type,  and  these  are  distributed  across  the  processors. 
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Considering  the  second  paxallel  strategy,  there  axe  two  causes  of  decreased 
parallelism.  The  fust  is  simple  divisibility.  When  the  number  of  processors  does 
not  divide  the  number  of  work  units  to  be  performed,  there  will  be  a  time  in  which 
some  processors  are  busy  while  others  must  wait.  The  second  catise  is  the  duplica¬ 
tion  of  effort  required  when  the  algorithm  switches  from  scheduling-on-butterflies  to 
scheduling-on-sequences.  In  scheduling-on-butterflies,  eaoh  processor  need  only  cal¬ 
culate  the  one  set  of  cosines  for  each  set  of  butterflies  it  performs.  While  scheduled 
on  sequences,  however,  all  the  processors  must  calculate  all  of  the  cosines  for  eaeh 
sequence,  implying  duplication  of  effort. 


Complexity  of  the  Parallel  Algorithm 

To  predict  the  speedup  due  to  the  parallel  implementation  consideration  must 
be  given  to  several  factors:  the  changing  amount  of  work  of  each  uncombine  type, 
the  cost  of  the  change  from  scheduling-on-butterflies  to  scheduling-on-sequences, 
the  divisibility  problems,  and  the  cost  of  the  fork-join  operation.  This  leads  to  an 
analytic  expression  involving  six  terms:  the  cost  of  the  fork-joins,  the  cost  of  the 
EQE  units,  the  cost  of  the  RRQE  butterflies,  the  cost  of  the  RtoR  scheduled  on 
butterflies,  the  cost  of  the  RtoR  scheduled  on  sequences,  and  the  cost  of  the  last 
pass  through  the  data.  This  can  be  written: 
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where  a  is  the  cost  of  one  real  addition,  c  the  cost  of  obtaining  a  cosine  from  the 
transcendental  library,  A  the  cost  of  an  RRQE  butterfly,  B2  the  cost  of  an  RtoR 
butterfly,  and  Bi  the  cost  of  an  RtoR  butterfly  without  the  cosines.  LT  is  the 
index  of  the  first  pass  through  the  data  in  which  the  RtoR  portion  must  be 
scheduled  on  sequences,  rather  than  butterflies. 

The  overhead  for  forking  operations  is  given  by  the  expression 


—  Qi  +  Blip  -l)+{log2N-l){a2  +  ^2(P~1)) 

where  oj  is  the  cost  of  the  first  fork  on  the  first  processor,  Bi  is  the  cost  per  addi¬ 
tional  processor  for  the  first  fork.  All  succeeding  fork  calls  have  a  cost  of  02  for  the 
first  processor  and  B2  for  each  additional  processor. 
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The  complexity  equation  is  difficult  to  analyze  because  of  the  least  integer 
function  which  occurs  in  most  of  the  terms.  In  cases  where  the  number  of  proces¬ 
sors  is  a  power  of  two,  the  least  integer  fimctions  are  easily  computed,  and  after 
some  algebraic  labor,  the  complexity  equation  reduces  to 


T,  =  ^ 


1.  ..  C.  ,  (3a+c-Bi) 

-log2N  +  — log2P+ - - - 


+  (i4+4o+2c)Iog2p  ~  2clog2iV— +  2c  +  Bi 


Regrouping  the  terms  of  this  equation,  the  structure  of  the  performance  model 
consists  of  foiir  terms: 

Tp  =  Tf(p,N)  +  0(— log2iV)  +  0(— log2p)  +  O(log2p) 

P  P 

Each  of  the  terms  of  this  equation  can  be  identified  with  the  phenomenon  it 
represents.  The  first,  Tf(p,N),  is  the  overhead  required  to  fork  processes.  The 

0{ — log2iV)  term  represents  perfect  speedup  relative  to  the  sericd  algorithm.  The 
P 

N 

remaining  two  terms  reflect  decreased  parallelism.  The  first,  0{ — log2p),  is  the 

P 

amount  of  time  spent  in  the  duplication  of  effort  catised  by  changing  from  RtoR 
scheduled  on  butterflies  to  RtoR  scheduled  on  sequences.  The  last  term,  0(log2p), 
represents  the  amoimt  of  time  spent  in  EQE  and  RRQE  butterflies  after  the 
sequences  become  sufficiently  short  that  there  are  fewer  of  these  type  butterflies 
than  there  are  processors. 

The  Parallel  Implementation 

The  algorithm  tahing  ordered  E  data  to  scrambled  coefficients  was  imple¬ 
mented  on  a  Sequent  Balance  multiprocessor.  The  maximum  nximber  of  processors 
available  to  one  user  was  23.  All  of  the  processors  had  access,  through  a  common 
bus,  to  all  of  the  data.  To  compute  predicted  performzmce  curves,  the  timing 
parameters  of  the  machine  were  obtained  from  the  Sequent  documentation,  and 
then  verified  by  experiment. 

Timings  of  the  actual  transforms  were  obtained  for  sequences  of  various 
lengths,  zind  speedups  compared  with  the  values  predicted  by  the  performance 
model.  The  results  axe  shown  in  two  separate  cheirts:  Figure  3  shows  the  perfor- 
mzince  characteristics  of  long  sequences  and  Figure  4  displays  those  of  shorter 
sequences. 

For  very  long  sequences,  (iV=32768,  iV=65536,  JV=  131072),  the  implementation 
performed  very  well.  There  is  good  speedup  throughout,  with  speedup  generally 
increasing  with  increasing  processors.  The  maximum  speedup  achieved  was  just 
over  17,  occurring  on  the  longest  sequence  when  transformed  on  21  processors.  The 
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open  circles  represent  the  predicted  speedups  from  the  model.  For  long  sequences, 
the  eictual  speedup  very  nearly  matches  the  model  speedup. 

On  shorter  sequences,  (iV=16384,  7V^=8192,  JV=4096),  the  implementation  per¬ 
formed  less  well,  both  from  a  standpoint  of  measured  speedup  alone,  and  when  com¬ 
pared  with  the  model  speedup.  In  all  cases,  there  is  a  significant  decrease  in  the 
efficiency  as  the  number  of  processors  is  increased,  and  on  each  curve  there  is  an 
'optimar  number  of  processors,  after  which  the  transform  requires  more  time  to 
perform  as  the  number  of  processors  is  increased.  On  a  sequence  of  length  16384, 
for  example,  the  best  performance  was  achieved  using  17  processors,  resulting  in  a 
speedup  of  approximately  8.  On  an  8192  point  sequence,  however,  the  best  results 
occurred  with  9  processors,  but  achieved  a  speedup  of  only  5.  Additionally,  as 
sequence  lengths  become  shorter,  the  actual  performance  differs  more  and  more 
from  the  predicted  curve.  This  may  be  attributed  to  two  factors.  First,  the  over¬ 
head  of  loop  indexing  is  not  included  in  the  model.  As  sequences  become  shorter, 
the  loop  indexing  represents  an  increasing  fraction  of  the  algorithm.  A  more  signifi¬ 
cant  factor  is  related  to  the  memory  management  of  the  Sequent  Balance.  Essen¬ 
tially,  as  sequences  become  shorter,  the  memory  accesses  by  the  processors  become 
more  frequent,  and  the  btis  becomes  saturated.  (It  should  be  noted  that  the  new 
generation  of  Sequent  multiprocessors,  the  Symmetry  faunily,  utilize  a  different 
memory  management  scheme  designed  specifically  to  eliminate  this  effect.) 

Conclusions 

The  transform  for  real  and  even  data  is  one  member  of  a  family  of  algorithms 
that  efficiently  compute  the  transforms  of  symmetric  sequences.  The  serial  versions 
of  these  compact  symmetric  algorithms  provide  a  tremendous  savings  over  the 
direct  xise  of  the  complex  FFT  to  transform  these  sequences.  They  also  offer  a  sav¬ 
ings  over  traditional  pre-  and  post-  processing  algorithms,  using  the  same  total 
storage,  but  requiring  somewhat  fewer  arithmetic  operations. 

The  compact  symmetric  algorithms  have  straightforward  extensions  to  shared 
memory  parallel  computers,  and  produce  additional  savings  from  parallelization.  A 
major  benefit  is  that  FFTs  are  usually  performed  as  part  of  some  larger  calculation, 
which  in  turn  is  made  more  efficient.  This  is  especially  true  for  many  of  the  sym¬ 
metric  sequences,  that  arise  in  the  direct  solution  of  partial  differential  equations 
with  various  boimdary  conditions.  Much  recent  research  [10]  has  centered  on 
improving  the  performance  of  these  larger  computations  by  implementing  them  on 
parallel  machines.  The  utilization  of  the  family  of  parallel  compact  symmetric 
FFTs  should  represent  a  significant  contribution  to  that  effort. 
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Figure  1 

Schematic  diagrcim  of  the  Swarztrauber  Algorithm.  F{  }  indicates  the  Discrete  Fourier  Transform. 
The  asterisks  represent  the  redundant  R  sequences  which  are  not  calculated  or  stored.  The  portion 
of  the  diagram  to  the  left  of  the  column  of  bars  is  the  ordering  phase,  that  to  the  right  of  the  bars  is 
the  combine  phase.  The  column  immediately  to  the  left  of  the  column  of  bars  is  the  E  sequence  in 
scrambled  order,  the  final  /’{E}  on  the  right  is  the  transform  in  natural  order. 
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Figure  2 


Storage  and  data  flow  diagram  for  compact  real  and  even  transform,  with  N  =  32,  taking  ordered 
data  to  scrambled  coefficients.  R  and  7  refer  to  the  real  and  imaginary  parts  of  the  complex  quan¬ 
tity.  During  each  pass  through  the  data,  the  first  set  of  lines  is  the  EQE  uncombine,  the  second  set 
is  the  RRQE  uncombine,  and  all  other  sets  are  RtoR  uncombines. 
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Figure  3 

Measured  Speedup  for  long  sequences.  Speedup  curves  are  shown  for  N  =  131072  (#),  N  =  65536 
(x),  and  N  =  32768  (+).  Perfect  speedup  is  represented  by  the  diagonal  line.  Theoretical  speedup 
is  plotted  at  p  =  2,  4,  8  for  N  =  131072  (solid  triangle),  N  =  65536  (solid  square),  and  N  —  32768 
(solid  circle).  At  p  =  4  only  the  square  is  plotted,  as  all  three  computed  values  fell  within  the  sire  of 
the  square. 
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Figure  4 

Measured  Speedup  for  'short'  sequences.  Speedup  curves  are  shown  for  N  =  16384  (*),  N  =  8192 
(x),  and  N  =  4096  (+).  Perfect  speedup  is  represented  by  the  diagonal  line.  Theoretical  speedup  is 
plotted  at  p  =  2,  4,  8  for  N  —  163842  (solid  triangle),  N  =  8192  (solid  square],  and  N  =  4096  (solid 
circle).  At  p  =  4  only  the  square  is  plotted,  as  all  three  computed  values  fell  within  the  size  of  the 
square. 
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Abstract  We  discuss  a  group  of  parallel  algorithms,  and  their  implementations,  for  solving  a  special  class 
of  nonlinear  equations  arising  in  VLSI  design,  structural  engineering  and  other  areas.  The  class  of  sparsity 
occurring  in  these  problems  is  called  block  bordered  structure.  We  present  the  explicit  method  and  sev^ 
implicit  methods  for  solving  block  bordered  nonlinear  problems,  and  give  some  mathematical  analysis  and 
comparisons  of  the  two  methods.  Several  variations  and  globally  convergent  modifications  of  the  implicit 
method  are  also  described.  We  present  computational  results  on  a  sequential  computer  that  help  compare 
and  justify  the  efficiency  of  the  algorithms.  Finally,  the  implementations  on  shared  memory  multiproces¬ 
sors  and  local  memory  multiprocessors,  are  discussed. 


1.  Introduction. 

The  solution  of  a  system  of  nonlinear  equations  is  one  of  most  basic  and  important  problems  encoun¬ 
tered  in  many  applications.  The  general  form  of  a  system  of  nonlinear  equations  is : 

/,(a:i.X2,...,x,)  =  0,  1=1 . n.  (1.1) 

Several  parallel  algorithms  for  solving  (1.1)  have  been  developed  and  implemented  on  some  parallel  com¬ 
puters.  Newton’s  method  is  the  main  approach  in  those  algorithms.  Thus,  most  algorithms  for  solving  (1.1) 
consist  mainly  of  solving  the  linear  Jacobian  system.  Many  parallel  algorithms  have  been  developed  for 
solving  a  linear  system,  such  as  parallel  factorizations,  parallel  SOR  method,  parallel  red  and  black 
method,  parallel  multicolor  and  so  on  (see  e.g.  Ortega  and  Voigt  [1985]).  One  of  the  typical  parallel  New¬ 
ton  methods  for  solving  (1.1)  is  called  Newton- Jacobi(or  Newton-SOR,  or  Newton-Gauss-Seidel),  in  which 
the  main  iteration  is  the  Newton  iteration  for  solving  (1.1),  and  the  inner  loop  is  to  solve  the  linear  system 
iteratively  by  using  the  Jacobi  method  (or  SOR  method,  or  Gauss-Seidel  method)  (see  e.g.  O’Leary  and 
White  [1985],  White  [1986]).  Fontecilla  [1987]  gives  a  parallel  implementation  of  a  different  approach,  the 
serial  nonlinear  Jacobi  algorithm  for  (1.1).  This  algorithm  is  based  on  the  same  idea  as  the  Jacobi  algorithm 

This  research  is  partially  supported  by  ARO  contracts  DAAL-03-k-0086  an  AFOSR  grant  AFOSR-8S-0251 . 


749 


for  solving  linear  systems  of  equations.  The  Jacobi  (or  SOR)  is  the  primary  iteration,  and  Newton  iterations 
are  used  to  ^qjproximately  solve  the  jth  block  of  equations  for  the  Jth  block  of  variables  in  the  inner  loop 

where  j  =  1 . m  for  m  ^  n .  This  method  is  called  the  Jacobi-Newton  method.  Coleman  and  Li  [1987] 

develop  parallel  algorithms  for  the  solution  of  (1.1)  on  a  message-passing  multiprocessor  computer  with  a 
distributed  finite-difference  Newton  method,  a  multiple  secant  method  and  a  rank-1  secant  method. 

In  the  case  of  very  large  nonlinear  problems  one  cannot  expect  a  single  parallel  algorithm  to  handle 
the  all  instances  of  the  nonlinear  problem  (1.1)  efQciently,  but  rather  the  algorithm  must  take  into  account 
the  sparsity  structure  and  other  special  characteristics  of  the  problem.  In  fact,  many  nonlinear  problems  in 
the  applications  have  their  own  special  sparsity  structure.  Parallel  algcxithms  taking  advantage  of  the  spe¬ 
cial  structure  can  be  much  more  efficient  than  the  algorithms  ignoring  the  special  structure.  In  this  paper 
we  give  a  group  of  parallel  algorithms  and  implementations  for  solving  a  special  class  of  nonlinear  equa¬ 
tions  arising  in  VLSI  design,  structural  engineering  and  other  areas.  The  class  of  sparsity  occurring  in  these 
problems  is  called  block  bordered  structure.  In  such  a  problem  the  n  variables  and  equations  may  be 
grouped  into  ^+1  subvectors,  Xi, ....  x,+i  and/ 1,  ....Z^+i  such  that  the  nonlinear  system  of  equations  has 
the  form 

=  i  =  l . q 

A+l(jcl.....Jc,+i)  =  0  (1.2) 


where 


Xi  €  /?"*,  /,  e  /?"*,  i  =  1, ....  (y-H,  and^'n,-  =«. 
The  block  bordered  Jacobian  matrix  of  (1.2)  is  as  follows: 


where 


Bi 


_  Aa  Bq 

Cl  Ci .  .  cl  / 


Ai  =  e  i  =  1, ....  q , 

i  =  l . q. 


<f+l 


.  =1 . q. 


(1.3) 
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In  VLSI  design,  P  is  i  permutation  matrix  and  the  structure  of  Bi  is  in  the  form  of 


(1.4) 


i.e.  only  one  Bi  is  nonzero  in  any  given  column  ci  the  right-hand  border  (see  e.g.  Rabbat  et  al  [1979], 
[1980]).  In  addition,  the  equations  /^+i  are  linear.  We  will  concentrate  on  systems  with  this  special  struc¬ 
ture. 


We  will  give  a  group  of  parallel  algorithms  for  solving  block  bordered  nonlinear  system  of  form 
(1.2)  which  may  be  implemented  on  both  shared  memory  multiprocessors,  such  as  the  Encore  Multimax, 
and  local  memory  multiprocessors,  such  as  the  Intel  hypercube.  In  section  2,  we  give  some  background  of 
the  block  txffdered  problems,  and  survey  related  work.  Section  3  presents  the  explicit  method  and  implicit 
method  for  solving  the  block  bordered  nonlinear  problem,  and  gives  some  mathematical  analysis  and  com¬ 
parisons  of  the  two  methods.  Several  variations  of  the  implicit  method  are  also  described.  Experiments  with 
the  two  methods  on  a  sequential  computo'  are  given  based  on  the  analysis  of  the  section.  Global  strategies 
for  the  di^erent  methods  and  their  implementations,  are  given  in  section  4.  Parallel  versions  of  these 
methods  are  described  in  section  S.  Our  conclusions  and  some  future  research  directions  are  summarized  in 
section  6. 


2.  Background  and  related  work. 

Block  bordered  systems  of  equations  having  the  fom  (1.2)  arise  in  many  areas  of  engineering  and 
science,  and  a  few  algorithms  have  been  developed  to  solve  them.  In  structural  engineering,  models  of 
large  struemres  may  be  divided  into  q  regions  such  that  each  region  only  interacts  directly  with  neighbor¬ 
ing  regions.  The  Xi  are  the  variables  for  each  region,  and  the  extra  linking  variables  (the  .t.i)  are  intro¬ 
duced  at  the  boundaries  of  the  regions.  The  linking  variables  are  tied  together  with  an  q -fist  set  of  equa¬ 
tions  representing  the  interactions  between  the  regions.  Thus  the  equilibrium  equations  for  such  a  model 
will  be  of  the  form  (1.2).  In  addition,  the  Jacobian  matrix  is  symmetric,  i.e.  Bi  =  Ci ,  and  often  the  sizes  of 
Ai  are  the  same.  One  current  parallel  algorithm  to  solve  the  problem  in  the  linear  case  (see  e.g.  Failiat  and 
Wilson  [1986])  is  to  let  each  processor  hold  the  pair  Bi )  as  well  as  fi  and  x,  .  Then  the  updates 
{x,\  i  =  1, ....  q )  can  be  all  performed  concurrently  by  solving  the  subsystem  in  parallel: 

i4,x*+*  =/.  -BiX^^i ,  i  =  1,  •  •  -  ^  (2.1) 


and  the  components  x,-  are  updated  locally  in  each  processor  using  sequential  SOR  iterations.  It  remains  to 
update  the  unknowns  associated  with  block  P .  This  block  is  coupled  to  all  the  Ai  terms.  If  its  size  is  negli¬ 
gible  compared  to  each  of  the  sizes  of  the  diagonal  blocks,  the  overall  algorithm  will  suffer  a  serialization 
for  only  a  small  amount  of  time.  If  not  (and  this  usually  the  case  for  three  dimensional  structures),  the 
updates  of  x,+i  may  ruin  the  sought  after  speed-up.  The  algorithm  is  simple  to  implement,  and  efficient  for 
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the  special  engineering  structure  problem  because  the  problem  is  linear  and  the  function  of  is  rela¬ 
tively  small.  Because  the  coefficient  matrix  (1.3)  of  the  linear  system  is  symmetric  and  positive  definite,  a 
parallel  conjugate  gradient  method  is  also  efficient  fcv  its  solution  (see  e.g.  Nour-Omid  and  Parit  [1986]). 

Similar  equations  arise  in  the  analysis  of  VLSI  design,  where  the  circuits  may  be  subdivided.  The 
concept  of  macromodeling  the  circuit  is  to  decompose  the  circuit  into  subcircuits  and  to  analyze  them 
separately.  Macromodeling  of  the  circuit  results  in  a  system  of  nonlinear  equations  of  the  form  (1.2). 
Xi  (i  =  q)  and  in  the  Jacobian  matrix  are  usually  used  to  represent  internal  and  input-ouq)ut 
variables  in  each  of  the  q  independent  subcircuits  respectively.  Here  the  equations  involve  voltages  and 
currents,  either  of  which  between  the  subcircuits  serve  a  linking  role  which  results  in  the  function  /,+i. 
Since  each  voltage  or  current  is  used  only  in  one  block  of  equations  /,-  plus  possibly  the  bottom  block 
/^+i,  the  nonzero  colunms  of  5,  (and  Ai)  are  disjoint  and  so  the  form  (1.4)  results.  The  size  of  the  func¬ 
tion  fq+i  may  be  quite  large. 

Two  nested  sequential  algorithms  taking  advantage  of  the  structural  properties  of  VLSI  circuits  have 
been  developed  by  Rabbat  et  al  [1979],  [1980].  The  multi-Newton  method  is  to  apply  Newton’s  method  to 
/,+i  of  (1.2),  where  x,,  (j  =  1, ....  q)  are  implicitly  determined  by  the/j  of  (1.3),  and  another  Newton 
method  is  applied  to  solve  them  in  the  inner  loop.  This  is  discussed  further  in  section  3.  Similarly,  the 
Gauss-Seidel-Newton  method  is  to  apply  the  Gauss-Seidel  method  to  fq+i  of  (1.2),  where 
Xi  (/  =  1 , ...,  q  )  are  implicitly  determined  by  /,  of  (1.2),  and  the  Newton  method  is  applied  to  solve  them 
in  the  inner  loop.  These  algorithms  appear  suitable  for  implementation  on  parallel  computers,  but  to  our 
knowledge  this  has  not  been  done. 

3.  Explicit  and  implicit  approaches  to  the  problem 

There  are  two  basic  ways  in  which  Newton’s  method  can  be  applied  to  (12).  The  explicit  approach 
for  solving  (12)  is  related  to  Newton’s  method,  which  simply  involves  iteratively  solving  the  linear  system 

y(X‘)AX*=-F(X*)  i  =  l . q  (3.1) 

for  AY* ,  whae  /  (Y* )  is  the  Jacobian  of  F ,  which  has  the  block  bordered  structure  of  (1.3). 

The  implicit  approach  is  to  solve  or  approximately  solve  each  of  the  q  equations 

/iU.J:<,+t)  =  0,  1=1 . q  (3.2) 

for  a  fixed  value  of  This  would  mean  that  each  of  the  x,-  is  implicitly  given  by  a  function  of  x^+i.  The 
the  whole  problem  (1.2)  is  then  equivalent  to  solving 

/^+l(xi(x,+i), ...,  x,,(x,+i),  x,+i)  =  0.  (33) 

The  Jacobian  of  this  system  is  given  by 
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or 


/  =  P  -  fjC.Ar'B.  1=1,  ...,<7 


(3.4) 


and  we  may  solve  (3.3)  by  Newton’s  method. 

In  this  section  we  describe  these  two  approaches  and  their  relations  to  each  other,  and  give  some 
experimental  results  on  a  sequential  computer. 

3.1  Ebcplicit  method  and  implicit  method 

Newton’s  method  applied  to  (1.2)  in  the  explicit  method  consists  of  the  following  formulas  at  itera¬ 
tion  k  {k  =  0,  1, ...):  from/,(j:), i  =  1, 

^Ax*  +  )  =  0  (31-1) 

or  equivalently 

Ai  Ax*  Bi  Ax *+i  +  fi (xt,  x^+i )  =  0 

and  from  /<,+i(xt . x^,x^+i ) 

+  A+l(jff . x^*\ )  =  0 

or  equivalently 

IjC,  Ax.*  +  P  Ax^+i  +/^+i(xf , ...,  x^,  x,*+i )  =  0.  (3.12) 

Substituting  (3.1.1)  into  (3.12),  we  obtain 

-  ijC^r‘B,)Ax^+t  =-A+l(J^f.  ijCi.4,-’/i(x,*  x,*+i)  (3.1.3) 

or 

/Ax^+1  =  -/.,+i(xt . x^,  x^+i )  +  ijC,  A,-‘/.  (x,*,  x,,*+i ) 

where  J  is  given  by  (3.4).  So 

Xqt\  +Ax^+i 

can  be  determined  from  (3.1.3),  and 

x,*"^‘  =  X,*  +  Ax,*  1=1, ...,  q  (3.1.5) 


P.1.3) 


(3.1.4) 


ai.i) 


(3.12) 
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can  be  detennined  from  (3.1.1). 

In  the  implicit  method,  Newton’s  method  is  applied  to  (3.3),  and  gives 

. )  =0.1.6) 

or 

/Ai^+i  +/,+i(acf'*‘‘-°  (x^+t ) . x^*^’°(x^^i ),  )  =  0.  (3.1.6) 

where  )  (t  =  1 , ...,  <7  )  is  implicitly  determined  by  solving  the  nonlinear  system 

/.(i**/,x*+i)  =  0  (3.1.7) 

for  x*'^.  Here,  J  is  the  inner  iteration  number  for  solving  (3.1.7)  for  x, ,  and  k  is  outer  iteration  number  for 
solving  (1.2).  We  use  a  second  (or  iiuier)  Newton  process  on  (3.1.7)  to  evaluate  x,-  (x,+i),  which  yields 

-^Ax,*>/-‘  +/,  (x**/-*,  x^‘+i )  =  0  i  =  1. ...,  <7 ,  ;  =  1,  2, ..  (3.1.8) 

or 

A. Ax*«/-‘  +/,(x*'/-‘, x^‘+i )  =  0  /  =  1 . q,  j  =  \,2, ..  (3.1.8) 

where  A,-  =  A,-  if  it  is  only  evaluated  once  at  the  beginning  of  each  outer  iteration,  else  it  may  be  evaluated 
up  to  j  times.  This  second  Newton  process  is  at  a  lower  level  since  Xf.fi  is  determined  from  (3.1.6)  and  is 
held  fixed  in  (3.1.8).  Thus,  y  =  1 , 2,  •  •  •  ,  and  k  is  fixed  for  the  outer  loop.  Then 

xkj  _  (3. 1.9) 

When  x,**^'*’'  exits  from  the  inner  loop,  it  is  set  to 

x*+l.o_x*j.  (3.1.10) 

Then,  Xf +1  is  determined  from  (3.1.6),  and 

x^tl  =x^^i  +Ax^+l  (3.1.11) 

32  Comparisons  and  analysis  of  the  two  methods 

The  following  theorems  show  that  the  explicit  and  implicit  methods  are  very  closely  related. 

Theorem  1.  If  the  function  +1  of  the  nonlinear  block  bordered  problem  (1.2)  is  linear,  then  the  equation 
solved  for  x^+i  (k  =0,1,  •  •  )  in  the  implicit  method  is  equivalent  to  the  one  in  the  explicit  method, 

except  that  the  value  of  x,  that  is  used  may  be  different 

Proof:  Substituting  (3.1.8)  into  (3.1.9),  gives  the  implicit  formula  for  solving  the  X,-  (i  =  1, ...,  q ): 

xkj  =  ^kj-i  _  Ar»/.  (x,*-'-’.  Xf^i )  (3.2.1) 
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where  j  =  1 ,  2,  •  ,  and  k  is  fixed,  and  Ai  =  Ai  if  it  is  only  evaluated  once  at  the  beginning,  else  it  may 

be  evaluated  up  to  j  times. 

Substituting  (3.2.1)  to  (3.1.6)  with  the  condition  of  linear  /^+i  gives 

=  -/,+,(xt  J-' . ....  )  +  ^CiArVi (.xH'K  ).  (3.2.2) 

(3.2.2)  is  equivalent  to  (3.1.3)  which  is  the  explicit  formula,  except  different  variables  may  be 
applied  in  the  two  formulas. 

Theorem  2.  If  is  linear  and  only  one  Newton  iteration  is  applied  to  solve  (/  =  1, ....  q),  i.e. 
j  =  1,  in  the  implicit  method,  then  the  steps  Ax^.^i  (for  a  fixed  ik)  are  identical  in  both  methods. 

Proof  :  This  follows  immediately  by  substituting  y  =  1  to  (3.2.2): 

JAx^^x  =  . x^f^,  x^^x )  +  ^CiAr^fi  (x*'°  )  (3.2.3) 

which  is  identical  to  the  explicit  formula  (3.1.3). 

Theorem  3.  If  -Af^BiAx^+x  is  added  to  the  right  hand  side  of  (3.1.10),  then  the  equation  solved  for 
Ax,  (i  =  1, ....  ^)  in  the  implicit  method  is  equivalent  to  the  one  in  the  explicit  method  except  that  the 
value  of  X,  that  is  used  may  be  different. 

Proof:  Adding  Ax,y+1  to  (3.1.10),  and  substituting  (3.1.9)  and  (3.1.8)  into  (3.1.10),  gives 

x*^1.0  *  (j.* j-i,  )  _  s. ]  (3.2.4) 

which  is  equivalent  to  the  explicit  formula  (3.1.1)  is  substituted  by  (3.1.5),  except  different  variables  may 
be  applied  in  the  two  formulas. 

Theorem  4.  If  /^+i  is  linear  and  only  one  Newton  iteration  is  applied  to  solve  x^4  (t  =  1 . q).  i.e. 

j  =  1,  in  the  implicit  method,  and  the  system  is  corrected  by  adding  -Ar'fi;  Ax,+i  to  x,  after  each  itera¬ 
tion,  then  the  explicit  method  and  implicit  method  a. .  identical 

Proof:  From  Theorem  2,  Ax*+i  are  identical  for  the  two  methods.  Substituting  y  =  1  into  (3.2.4): 

X*>1.0  =  J.* .0  _  ^ (^*.0,  )  _  B,  AX*^, ).  (3.2.5) 

which  is  identical  to  the  explicit  method  and  completes  the  proof. 

The  following  theorems  give  the  local  convergence  rates  of  the  explicit  and  implicit  methods. 
Theorem  5  results  from  standard  theory.  The  proof  of  theorem  6  is  given  in  Zhang  [1989]. 

Theorem  5.  Assume  that  F(x)  is  continuously  differentiable  in  an  open  convex  set  D  e  /?" .  Assume  that 
there  exists  x*  g  such  that  F(x' )  =  0.  7 (x*^  is  nonlinear,  and  7 (x)  is  Lipschitz  continuous  in  an 
open  neighborhood  containing  x* .  Then  the  explicit  Newton’s  method  is  locally  quadratically  convergent 
lox* . 
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Theorem  6.  Let  the  assumption  of  theorem  S  hold,  and  assume  the  addition  that  each  A,-  (x* )  is  nonsingular, 
and  that  each  A,(x)  is  Lipschitz  continuous  in  an  open  neighborhood  containing  x*.  then  the  implicit 
Newton  method  with  one  inner  iteration  per  outer  iteration  (J  =  1)  is  locally  2-step  quadratically  conver¬ 
gent  to  x*  . 

3  J  A  corrected  implicit  method 

Theorem  3  and  Theorem  4  indicate  that  the  implicit  method  may  obtain  the  same  quadratic  rate  of 
convergence  as  the  explicit  method  even  if  the  inno-  itoation  is  solved  inexactly,  if  a  correction  term  is 
added  to  (3.1.12)  after  each  iteration.  The  problem  may  be  defined  to  find  a  correction  term  5  such  tliai 

fi  (x*+‘-°  5.  x^^^ )  =  0.  (3.3.1) 

or 

fi (x*^>'0  -h  5.  x*+i  -f-  Ax*+i )  =  0.  (3.3.1) 

(3.3.1)  may  be  approximated  by  treating  the  function /j  to  be  linear,  then 

/,  x^+i )  -I-  A,- 5  -H  Ax^+i  =  0.  (3.3.2) 

The  correction  term  5  is  obtained  from  (3.32) 

5  =  1  (3-3.3) 

After  j  inner  iterations  for  solving  for  a  fixed  ik  ,/,•  (x,*"^*-®,  x^+i )  =  0.  Thus  we  may  make  a  further 
approximation  for  the  correction  term  5 

5  =  -Ar’B,Ax,^i.  (3.3.4) 


which  is  exactly  the  correction  term  we  have  used  in  Theorem  3. 

We  prove  a  lemma  showing  that  one  step  of  the  corrected  implicit  method  has  a  similar  structure  to 
one  step  of  the  explicit  method. 

Lemma  1.  One  step  of  the  corrected  implicit  method  with  j  irmer  iterations  is  identical  to  solving  the  fol¬ 
lowing  linear  system  of  equations: 

r  n  1 


Az 

Cl  Cz 


Bx 

B 


Aq  Bq 
Cq  P 


Ax{ 

Axi 

Ax,*+i 

(3.3.5a) 


and  one  step  of  the  uncorrected  implicit  method  with  j  irmer  iterations  is  identical  to  solving  the  following 
linear  system  of  equations: 
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^  ^  Aa  .  Ari 
C\  C2  .  .  Cq  P  Axi+1 


. x^-°,x^+i) 


(3.3.5b) 


where  P  =  ^C,- Ar'S,-  +  P . 

Proof :  Recall  that  x,-  (/  =  1, ....  q)  is  calculated  in  the  inner  iterations  using  equations  (3.1.8),  (3.1.9), 
(3.1.10),  andx^+i  is  calculated  following  the  inner  iterations  by  (3.1.6). 

Substituting  x,*"*"*-®  =  x,*’°  +  ^Ax,*-^  into  (3.1.6)  and  using  the  linearity  of /^+i,  we  obtain 
(P  -  =  -A+i(j:f-° . 


IjC.  i ixH,  x^^x )  -t-  5.  Ax,^i)]  +  PAx^^x  =  -/,+i(J:.*-° . )  (3.3.6a) 

where 

(X,*-'.  )  +  5,  Ax**i )  =  ^Ax,*-^  -  Ax*+i.  (3.3.6b) 

The  right  hand  side  of  (3.3.6b)  is  the  corrected  step  of  x*"*"**^  — x,*-®  after  the  outer  iteration  is  complete. 
Let  the  corrected  step  be  Ax,*,  (3.3.6a)  becomes 


|;CiAx.*  +  PAx*^i  =-/,^,(xP . x*-0,x*+q) 


(3.3.7a) 


The  equation  (3.3.6b)  for  solving  the  corrected  Ax,*  may  be  converted  to 


Ai  Ax,*  +  Bi  Ax^+i  =  -^/,  (x,*-^,  x,*+i ). 


(3.3.8a) 


This  completes  the  proof  of  the  first  part  of  the  lemma 

From  (3.3.6a)  and  (3.1.8),  one  step  of  uncoirected  implicit  method  is  equivalent  to  solving 


Ai  Ax,*  =  (x,*-'.  x^+, : 


(3.3.7b) 


IjC,  Ax,*  +  {'^CiAr^Bi  +  P )Ax*^,  =  -/,^,(x,*-0, ...,  x*-0, x*^i ).  (3.3.8b) 
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which  is  equivalent  to  solve  the  linear  systems  of  equation  of  (3.3.S.b). 

From  theorem  4  and  5,  we  view  that  the  corrected  implicit  method  with  one  inner  iteration  (J  =  1)  is 
locally  quadratically  convergent  Theorem  7  shows  that  this  rate  of  convergence  is  retained  if  more  inner 
iterations  are  used.  Its  proof  will  be  given  in  Zhang  [1989]. 

Theorem  7.  Let  the  assumptions  of  theorem  6  hold.  Then  the  corrected  implicit  Newton  method  with  J  ^  1 
inner  iterations  per  outer  iteration  is  locally  quadratically  convergent  vox* . 

3.4  Some  experiments  on  a  sequential  processor 

We  have  tested  the  methods  discussed  in  this  section  on  several  problems.  Here  we  report  results  on 
a  simple  20x20  nonlinear  block  bordered  system  of  equations  which  has  four  4x4  blocks,  1,  •  ■  •  ,  ^4  4, 
and  a  4x4  bottom  block  P  which  is  a  4x4  matrix,  and /^+i  linear.  First,  we  compare  the  performance  of 
the  three  methods  when  only  one  inner  iteration  (j  =  1)  is  used  in  the  uix:orrected  implicit  and  corrected 
implicit  methods.  All  these  experiments  were  run  on  Pyramid  P90  computer. 


Experiments  with  the  three  methods  (j=l) 

outer  iterations  (seconds) 

explicit 

implicit 

corrected  implicit 

13  (0.44) 

14  (0.40) 

13  (0.40) 

The  explicit  method  and  the  corrected  implicit  method  with  y  =  1  are  identical  (see  Theorem  4.).  Thus, 
the  same  number  of  iterations  are  u.sed  to  converge  to  the  solutions.  The  computing  times  are  slightly  dif¬ 
ferent  since  the  implementations  of  the  two  methods  are  different  The  implicit  method  converges  a  little 
bit  slower  than  the  other  two  methods,  which  is  reasonable  siiKe  our  analysis  shows  it  has  a  2-step  qua¬ 
dratic  convergence  rate,  (see  Theorem  6). 

Next  we  increased  the  number  of  inner  iterations  in  the  implicit  and  corrected  implicit  methods. 


Experiments  with  the  implicit  method  (j>l) 

outer  iterations  (seconds) 

j  =  l 

j  =  2 

j  =  3 

j  =  4 

j  =  5 

14  (0.40) 

8  (0.34) 

7  (0.40) 

6  (0.44) 

6  (0.54) 
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Experiments  with  the  corrected  implicit  method  (j>l) 

outer  iterations  (seconds) 

j=l 

j«2 

j  =  3 

j  =  4 

j  =  5 

13  (0.40) 

8  (0.38) 

6  (0.36) 

6  (0.50) 

5  (0.54) 

The  experimental  results  show  that  the  number  of  outer  iterations  is  sharply  decreased  when  the  number  of 
inner  iterations  is  greater  than  1.  However,  the  number  of  outer  iterations  docs  not  decrease  forever  as  j 
increases.  There  exists  an  optimal  j  for  the  number  of  outer  iterations,  or  for  the  least  time  in  both 
methods,  but  it  is  problem  dependenL  Our  experiments  also  show  that  the  corrected  implicit  method  con¬ 
verges  a  little  bit  faster  than  the  unconected  implicit  method  when  j  >  \,  which  is  consistant  with  our  con¬ 
vergence  aiulysis. 

4.  Globally  convergent  modifications  of  the  corrected  implicit  method 

The  corrected  implicit  method  was  shown  to  be  locally  quadratically  convergent  in  the  last  section. 
In  this  section,  we  will  give  conditions  for  the  steps  generated  by  explicit  method  and  uncorrccted  implicit 
method  to  be  descent  directions.  We  will  also  discuss  the  combination  of  a  globally  convergent  strategy  for 
the  corrected  implicit  method  with  a  fast  local  strategy.  We  let  I  I .  I  I  denote  the  /2  (Euclidean)  norm. 

4.1  The  conditions  for  a  descent  direction 

The  basic  idea  of  a  global  method  for  solving  block  bordered  nonlinear  problems  is  to  choose  a 
direction  &X  from  the  current  point  .Jf*  in  which  F  decreases  initially,  and  a  new  point  in  this  direc¬ 
tion  from  X*  such  that  I  I  <  I  IF(X*)I  I.  Such  a  direction  is  called  a  descent  direction. 

Mathematically,  AX  is  a  descent  direction  from  X*  if  the  directional  derivative  p  of  I  IF  I  i  *  at  X*  in  the 
direction  AX  is  negative  j.c.  if 

p  =-F(X)^y(X)AX  <0.  (4.1.1) 

If  (4.1.1)  holds,  then  it  is  guaranteed  that  for  sufficiently  smaU  positive  5, 
I  IF(X*  +5AX)I  I  <  I  IF(X*)I  I.  Given  a  descent  direction  AX*,  we  set  X*'*'*  =X*  -t-XtAX*  for 
some  X*  >  0  that  makes  I  IF(X*‘*'')I  I  <  I  IF(X*)I  I,  where  X*  is  chosen  by  a  line  search  strategy, 
(see  e.g.  Dennis  and  Schnabel  [1983]).  The  following  theorems  indicate  when  the  directions  generated  by 
the  explicit,  implicit  and  corrected  implicit  methods  are  descent  directions. 

Theorem  8.  The  step  generated  by  the  explicit  method  is  a  descent  direction  on  the  function  I  I  F(X)  I  I 

Proof:  Since  the  explicit  method  is  a  pure  Newton  method,  the  step  will  be  a  descent  direction.  This  may  be 
simply  shown  by 

p  =-F(X*)O(X*)y(X*)->F(X*)  =  -F(X*)^F(X*)<0 
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Recall  from  lemma  1  that  one  stq)  of  the  uncorrected  implicit  method  is  identical  to  solving  the 
linear  system  of  equations  (3.3.5b).  From  (3.3.5b),  the  directional  derivative  of  I  I F  (X  )  I  I  ^  at  X*  in  the 
direction  AX  =  (Axf , ....  Ax^,  Ax*+i )  is  given  by 

p=F(X7J0CVi^F  (4.1.2) 

where 

F  =  [^/  ). ....  ). /,+i(xf -0 )] 

and  Ji^  is  the  Jacobi  matrix  in  (3.3.5b).  If  only  one  inner  iteration  is  applied  {J  =  1).  (4.1.2)  becomes 

p  =-F(X‘)^y(Y*)/4F(A:*)  (4.1.3) 

The  multiplication  7  (X*  )74  yields  a  full  matrix  which  is  given  by 

/-5,Ci/lr‘  -B1C2A2’  -5iC,V 

-52^1^4 r‘  B2F~^ 

(4.1-4) 

l-B,C^Af^  B^p-^ 

CiAi-»(/-F)  f '(/-/*)  ^ 

When  the  element  values  of  (i  =  1, ....  q)  are  large  enough,  7(X*)74  not  be  positive 

definite,  so  it  may  occur  that  the  step  given  by  the  unconected  implicit  method  may  not  be  a  descent  direc¬ 
tion. 

Now  we  consider  the  corrected  implicit  method  with  a  line  search  on  the  inner  iteration.  If  the  New¬ 
ton  direction  along  Ax,**^  =  -A,“*/,(x,*'', x|+i),  (/  =0,  =  1,  ...,q),  in  the  inner  iteration  isa 

descent  direction  for  I  I/,-  (x,*-^,  x,j+i)  I  I  a  line  search  global  strategy  can  be  applied  in  the  end  of  each 
L'tner  iteration 

Axkj^l  =x*-/^»  +hjAx,^J,  I  =0.  ....;-l,  i  =  1. ....  q. 

where  is  the  distributed  line  search  parameter  for  ith  block  in  the  /+lth  iteration  so  that 

I  l/.■(JC^°.x^+l)l  I  ^  I  i/i(j:*’‘,x,*+i)l  I  ^  ^  I  l/t(x*>/-‘,x^+i)l  I  i  =1,  ...,q.  (4.1.5) 

Theorem  9.  If/^+i  is  linear,  and  an  inner  line  search  satisfying  (4.1.5)  is  required,  then  the  necessary  and 
sufficient  conditions  for  the  corrected  implicit  direction  Axt'^~^,  Ax^+i ,  (r  =  1, ...,  17)  to  be  a  descent 
direction  firom  x,**®,  x^+i ,  whae  J  =  1,2,  ■  ■  ■  is  the  inner  iteration  number,  are 

(1);  =1; 
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or  (2)  j  =  2  and 


ijXo.,  1  |2+  I  . |2> 

or  (3)  j  >2  and 


Some  sufficient  conditions  which  are  valid  given  (4.1.5)  are: 
(1)  j  =  2,  and  Xo.j  >  Xi.,-  i  =  1, ....  ^ : 
or  (2)  j  >  2,  and 


I  l^+  I  ....  I^> 


Proo/ ;  Based  on  (3.3.5a)  in  lemma  1.  one  step  of  the  corrected  implicit  method  with  inner  line  search  is 
identical  to  solving  the  following  linear  system  of  equations  similar  to  (3.3.5a): 


A2  Bi 

_  _  Aq  Bq 

C\  C2  •  cl 


Ax 

Ax 


Ax} 


Ax}. 


v+i 


1 


. X}'^,x}+l) 


(4.1.6) 


The  directional  derivative  of  I  I  F(7f )  I  I  ^  at  X*  in  the  direction  AX  =  (Axf . ....  Ax},  Ax}+\ )  is 
given  by 

p  =-F(XfF  (4.1.7) 


where 

iCxf-^x}.^! ). ....  ^h,qfq(x}’‘x}^\  ),/,+i(xt'° . x}-°,  x*+i )] 


Thus 


P  =  -CPo  +  |j^X/^/,(ac*-®.  x*+,  )fi  (x}J,  j:*+i  ))  (4.1.8) 

wherepo=  I  ^fi(.xf-^,x}^i)\  l“+  I  l/,+i(j:t'° . JC,+t)l  l2>0 

Therefore,  p  <  0  if  and  only  if 

(1)7  =1; 
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or  (2)  ;■  =  2,  and 

or  (3)  y  >  2  and 

Po  >  ) 

In  order  to  derive  some  simpler  sufficient  conditions,  the  following  approximation  is  substituted  into 
(4.1.8): 

Then 

P  I  I  I  l/iteHx*^,)l  I). 

Therefore,  p  <  0  if 

(1)  ;■  =2  and  Xqj  (i  =  1.  since  I  l/i(x,*P,x,^+l)l  I  >  I  l/,(x,*-*,x^+i)l  I  aftw 
the  inner  line  search, 

(2)  J  >2  and 

P0>  1  I/.  (a:.-‘-®,x,^,)I  1 1  l/.•(x*•^.x*^l)l  I. 

A  special  case  of  theorem  9  is  when  k/j  =  1  (1=0,  i  =  1, ....  q),  that  is,  no  line  search 

is  applied  in  the  inner  iteration.  The  necessary  and  sufficient  conditions  are  then: 

(1);  =1; 
or  (2)  7=2  and 

I  IF(A:tP.  ...,x*Px‘+i)l  |2  +  P(xfP.  ....x*P  x*^,)rp(a:t-» . x*'>.x*+i)  >  0 

or  (3)  7  >  2  and 

I  I P(xt-0 . x,*P,  x^+i )  I  1 2  +  P(a:f.o . x*^,  )^^P(xt-' . ....  x*-/. x*.., )  >  0 

Some  sufficient  conditions  are 

(1)7  =2,and  !  I  P(xfP.  ....x*P  x/+i)  I  I  ^  1 1  F(xF,  ...,x*'i,x‘+,)  I  I. 
or (2)7  >2, and  I  I  F(xf;°,  ...,x^-°,x^+i)  II  F(xf’^. .... x^-^, x^+t )  I  I. 
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4.2  The  implementation  of  the  global  strategy  for  the  corrected  implicit  method 

The  idea  of  a  globally  convergent  modification  of  the  corrected  implicit  method  is  to  try  the  method 
with  step  length  one  first  at  each  iteration.  If  it  seems  to  be  taking  a  reasonable  step  -- 
I  I F (.if  ■*■'•0 , ....  x^tl )  I  I  decreases  sufficiently,  then  use  it  If  not,  fall  back  on  a  step  dictated  by 
the  line  search  method.  Such  a  strategy  will  always  end  up  using  the  full  corrected  implicit  method  stq) 
close  to  the  solution  and  thus  retain  its  fast  local  convergence  rate.  If  the  global  method  such  as  line  search 
is  chosen  and  incorporated  properly,  the  method  will  also  be  globally  convergent  under  appropriate  condi¬ 
tions. 

Our  analysis  shows  that  the  conditions  for  descent  direction  without  inner  line  search  are  stronger 
than  the  conditions  with  inner  line  search,  since  the  latter  is  guaranteed  to  be  satisfied  when  ;  =  2  and 
-  ^1.1  0  =  1.  •  <!)■  Our  experiments  show  that  the  total  number  of  iterations  decreases  most  shar¬ 
ply  at  j  =  2.  Thus,  we  would  choose  to  implement  the  global  strategy  with  inner  Line  search  When  j  >  2, 
(Aif ,  ....  Ai*.  Ax^+i )  is  a  descent  direction  and  the  line  search  can  be  applied  at  the  end  of  the  iteration  if 
the  condition  (3)  of  theorem  9  is  satisfied.  If  the  condition  is  not  satisfied,  the  line  search  is  not  applied  at 
current  the  j  point  but  the  ^-l  point  which  satisfies  the  conditions  for  the  descent  direction.  Then  a  new 
iteration  is  started.  The  detail  corrected  implicit  method  with  global  line  search  strategy  is  given  by; 

Inner  Newlon  step 

1. setj  =0and/j«  =  fixed  inner  iteration  number  (2;2). 

2.  Do 

(a)  Solve  Ai  a:/+i  )AxH  =  -/,  U**',  )  for  Ax*>/  (j  =  1 . q  ); 

(b)  inner  line  search :  xf-’‘*^  =  +  XijAxH  for  some  X, ^  >  0  so  that 

I  I  <  I  l/.(x*-',x*+,)l  l,(t  =  1 . q,  /  =0.  ....;-l). 

(c)  set  y  =  y  1 

(d)  if  (J  =  1)  then 

set  Xi,,  =  Xo,.-  j=l,...,<7 

endif 

Until  [j  -  lin  or  condition  (3)  is  not  true} 

if  (condition  (3)  is  not  true) 

Setj:*+i.o  =  jj*j-i  (/  =  qy 

else 

=  (i  =  1,...,^). 

endif 
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Main  Newton  step 

3.  Form/  =  P  -  Mr'CJC.*-®.  ) 

4. ¥a.cuxi2&J  =Lq+\U^^.l. 

5.  Solve /Ax,+i  =  -/,+i(xt+*'° . x^+i )  for  Ax,+i, 

6.  Do  correction:  =  x,*'*‘^'°  -  Ar^Bi  Ax^+i ,  (l  =  1, ....  q ) 

1.  outer  line  5earcA:x,*'*’*'°  =  x,*-®  +  X*(x/'‘*‘’‘®-x,**®),  (i  =1 . q), 

and  x*+f  =  x^+i  +  X*  Ax^+i  for  some  X*  and  X*  >  0 

so  that  I  IFCxt+'-O . x‘+i'0,x*;^)l  I  <  1  IF(xt-°.  •  •  •  xfO,x*+,)l  I. 

4  J  Summaries  of  the  two  t}^es  of  the  methods 

We  give  the  following  summary  based  on  our  experimental  comparisons  and  analysis  of  the  explicit 
method  and  implicit  methods. 

(1)  The  implicit  (unconected  and  corrected)  methods  requires  more  function  evaluations  per  iteration  than 
the  explicit  method  since  more  than  one  inner  iterations  are  applied,  but  possibly  fewer  total  iterations. 

(2)  The  corrected  implicit  method  has  an  asymptotic  convergence  rate  at  least  as  fast  as  the  explicit  method 
since  it  retains  quadratic  convergence  rate,  and  a  little  bit  faster  than  the  uncorrected  implicit  method.  Both 
unconected  and  conected  implicit  methods  may  speed  up  the  convergence  of  the  interior  variables. 

(3)  A  global  strategy  such  as  a  line  search  can  be  applied  to  the  corrected  implicit  method  to  ensure  global 
ccxivergence  subject  to  limited  restrictions. 

(4)  The  implicit  methods  will  be  shown  to  have  additional  advantages  on  parallel  computers  in  the  next 
sections. 


5.  Parallel  solutions  to  the  problem 

We  will  briefly  discuss  a  range  of  possible  strategies  for  parallelizing  the  structure  of  block  bordered 
systems  of  nonlinear  equations.  What  strategies  are  best  depends  on  the  nature  of  the  nonlinearities  and  the 
sparsity  structure  of  the  problem,  as  well  as  on  the  characteristics  of  the  parallel  machine  being  used.  We 
Intend  to  implement  these  strategies  and  variations  of  them  on  both  shared  memory  parallel  machines,  such 
as  the  Enco’e  Multimax,  and  local  memory  parallel  machines  such  as  the  Intel  Hypercube. 

5.1  Parallel  explicit  method 


The  explicit  method  involves  iteratively  solving  the  linear  system  (3.1).  Thus  the  parallel  method 
focuses  on  how  to  solve  the  block  bordered  linear  system/ (X')AX^  =-F{X)  which  is  of  the  form: 

1. 


Cl  C2 


Aq  Bq 

C,  F 


Axi 

-/i 

AX2 

-/2 

Ax, 

Ax,+i 

-Jq^\ 

(5.1.1) 
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in  parallel. 


Recall  that  Ax,  (i  =1, ....  9  )  and  Ax,,+i  can  be  explicitly  solved  as  follows: 

Ax,  =  -Ar^fi  -  Ar^Bi  Ax,+i.  (5.12) 

and 

(P  -  ^CiAr^Bi)Ax^^i  =  -/,+i  +  ^CiArVi ■  (5.1.3) 

Obviously,  the  q  factorizations  of  Ai  =  L,  t/,-  and  the  q  solutions  of  Af^fi  and  AC^Bi  i  =\,  ...,q ,  may 
be  performed  concurrently.  But  the  other  operations  do  not  decompose  as  obviously.  Thus,  the  following 
basic  operations  are  directly  from  (5.1.2)  and  (5.1.3): 

1.  factorize i4,-  (i  =  1, ....  q)  in  parallel. 

2.  solve  A,' z,-  =/,•  (i  =  1, ...,  ^)  for  z,-  =Ar^fi  in  parallel. 

3.  solve ^l/W,  =5,  (i  =  1,  ...  q)  forw,  =y4,~*5,  in  parallel. 

*4.  fcwm  J  ^(P  ) 

A 

•5.  factorize y 

*6.  solve y  Ax,+1  =--/^+l  +  "^CiZi  for  Ax^+i. 

7.  Ax,  =-z,  -w,Ax,,+i  (i  =  1,  ...,<7)  in  parallel. 

8.  x,*"*"!  =  Ax,*  +  X,*  (i  =1, ....  ^ )  in  paralleL 

*9.x^t\  =Ax*+i  +x,*+i 

The  steps  with  stars  requires  some  synchronizations  on  a  shared  memory  multiprocessor,  or  some 
message-passing  among  the  nodes  on  a  local  memory  multiprocessor  to  parallelize.  We  will  discuss  those 
their  implementations  next 

On  a  shared  memory  multiprocessor  data  is  suxed  in  the  shared  memory  where  it  can  be  accessed  by 
all  processors  through  an  interconnection  network.  Step  1,  2,  3,  7  and  8  are  independent  data  operations, 
and  may  be  fully  parallelized.  The  matrix  multiplications  and  subtractions  in  step  4  and  9  are  independent 
data  operations  on  a  shared  memoy  multiprocessor,  which  may  also  be  fully  parallelized.  Steps  5  and  6  are 
to  solve  a  linear  system  of  equations  by  first  factoring  J  and  then  back  solving  for  the  variables  of  Ax^  .4.1. 
These  operations  involve  dependent  data  operations,  and  synchronizations  are  required  for  the  computa¬ 
tions  on  a  shared  memory  multiprocessor.  Many  parallel  algorithms  fcr  LU  decomposition  and  back  solv¬ 
ing  on  a  shared  memory  multiprocessor  have  been  developed,  (see  e.g.  Jordan  [1985]). 

Thus,  on  a  shared  memory  multiprocessor,  the  operations  of  the  explicit  method  may  fully  be  paral¬ 
lelized  except  step  5  and  6  which  involve  some  of  the  synchronizations.  AJthough  the  synchronizations 
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seem  minor  in  comparison  with  the  parallel  operations  in  those  two  steps,  the  bottle-neck  of  explicit 
method,  if  any.  will  come  from  solving  the  bottom  linear  systems  of  equation  at  each  iteration. 

On  a  local  memory  multiprocessor,  there  is  only  local  memory  associated  with  each  processor  and 
data  is  passed  among  the  processors  through  a  connection  netwodc.  Since  data  is  not  shared,  a  distributed 

data  structure  is  associated  with  the  parallel  algorithm.  In  our  application,  processor  p,-,  i  =0 . p-l 

will  store  the  following  data  file: 

Block  A,  or  a  group  of  blocks  A| ; 

Block  Ci  or  a  group  of  blocks  Ci ; 

Block  Bi  csi  group  of  blocks  Bi ; 

Blocks  Ax,'  and  x,  or  groups  of  blocks  Ax^  and  x,- ; 

An  efficient  LU  factorization  algorithm  needs  to  minimize  the  communication  costs  among  the  pro¬ 
cessors,  and  keep  all  the  processors  working  in  parallel.  Current  fast  parallel  LU  factorizations  on  local 
memory  multiprocessors  (see  e.g.  Moler  [1986])  require  the  columns  of  the  coefficient  matrix  to  be  evenly 
distributed  among  the  processors.  In  order  to  keep  all  processors  working  efficiently,  the  matrix  is  distri¬ 
buted  in  following  order  column  y  is  in  processor  (/ —1)  mod  p .  This  kind  of  storage  is  called  wrap  m^ 
ping.  Thus,  the  columns  of  P  matrix  are  distributed  in  wrap  mapping  among  the  processors.  Ax^.,.], 
and  fq^,\  are  stored  in  the  control  processor,  say  po> 

Based  on  this  distributed  data  structure,  steps  1, 2, 3.  and  8  in  the  explicit  algorithm  are  independent 
data  operations  without  any  data  communications  and  may  fully  be  parallelized  in  a  local  memory  mul¬ 
tiprocessor.  Step  7  is  also  a  independent  data  operation  after  Ax^ .^i  is  broadcast  from  po  Since  Ax,^.,.!  and 
Xq+\  are  stored  in  po,  step  9  is  a  sirgle  process  in  po.  This  sequential  operation  has  minor  effect  to  the 
paraUel  performance  since  the  computation  is  small  comparing  with  other  parallel  operations. 

The  columns  of  /  are  required  to  be  distributed  in  wrap  fashion  for  efficiently  solving  the  linear  sys¬ 
tem  of  equations  on  a  local  memory  multiprocessor,  and  the  columns  of  P  are  already  distributed  in  wrap 
mapping  among  the  processors.  Thus,  forming  J  in  parallel  in  a  local  memory  multiprocessOT  requires 
some  message  passing  among  the  processors. 

Step  6  in  the  explicit  method  involves  solving  a  single  (lower  or  upper)  triangular  linear  system  of 
equations  in  parallel.  This  would  be  hard  in  a  local  memory  multiprocessor,  and  would  be  especially  hard 
in  the  case  where  the  matrix  is  distributed  by  columns  instead  of  by  rows.  There  has  been  some  recent  pro¬ 
gress  on  this  problem  (see  e.g.  Romine  &  Ortega  [1986],  Li  &  Coleman  [1986],  [1987]).  Li-Coleman’s 
methods  require  the  columns  of  the  (upper  or  lower)  triangular  matrix  be  distributed  to  p  processors  in  a 
wrap  mapping.  The  computation  is  not  perfectly  parallelized,  and  the  speedup  increases  as  increases. 

We  would  use  a  distributed  sequential  method  is  applied  to  solve  the  triangular  system  without  any  extra 
communications  when  n  is  small.  When  n  is  large,  we  would  apply  Coleman’s  method  to  solve  the 
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triangular  system  since  the  columns  have  already  been  stored  in  wrs^  fashion  during  the  factorization. 


Steps  4,  S,  6  in  the  explicit  method  involve  data  communications.  Thus,  the  bottle-neck  of  the  paral- 

A 

lei  explicit  method  is  to  form  J  and  to  solve  the  bottom  system  of  Unear  equations. 

5.2  Parallel  implicit  method 

One  portion  of  an  iteradon  in  the  implicit  method  is  to  solve  each  of  the  q  equadons 

/.-te,  =  0,  I  =  1 . q  (5.2.1) 

in  parallel  for  a  fixed  value  of  x^+i.  Newton’s  method  may  be  used  to  solve  (5.2.1),  Then  Ax^+i  is  solved 
by 

/Ax*+i  +  (x*+i ). ....  )  =  0  (5.2.2) 


Based  on  (5.2.1)  and  (5.2.2),  the  parallel  impUcit  method  is  given  by: 

Inner  Newton  step 

1. ;=0. 

2.  Solve  Ai  Ax,^'^  =  -/i  at  points  for  Ax,*  in  parallel 

(a)  X, *•/■*■’  =  x,**^  +  AXi^J. 

(b)  if  X, *■*■*'/  is  not  "precise"  enough,  set  /  =  y  1,  and  goto  step  2.a.  Else  condnue. 

(c)  Set  x,*'*‘**°  =  in  parallel 

3.  solve .<4, H',-  =5,  (i  =  1, ...,  q)  forw,-  =i4,“*5/  in  parallel. 

Main  Newton  step 

♦4.  Form  J  =  P  -  ^C, 

*5.  factorize  /  =  L^+iU^+\. 

*6.  solve /Ax,,+i  =  -/,+i(xf-^'-0 , ..., x^+j )  for  Ax,,+i. 

7.  Do  correcdon:  x,*"^*’®  =  x,*"^*-®  -  Wi  Ax^+i 

8.  Ax*;/  lAx^+i- 

The  data  structures  and  operadons  of  the  impUcit  method  are  almost  exactly  same  as  the  operadons 
of  the  explicit  method  although  they  are  different  methods  and  have  different  performance.  The  impUcit 
method  is  expected  to  have  more  inner  iieradons  and  less  total  iteradons  than  the  explicit  method.  How¬ 
ever,  the  implementadons  of  each  of  these  steps  on  both  shared  memory  and  local  memory  muldprocessors 
are  roughly  as  same  as  for  the  parallel  explicit  method  described  in  secdon  5.1. 

5J  What  can  we  gain  from  the  impUcit  method  in  parallel 
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The  analysis  in  section  3  indicates  that  the  corrected  implicit  method  converges  at  least  as  fast  as  the 
explicit  method.  If  we  assume  the  total  computing  time  for  solving  a  given  nonlinear  problem  by  the  expli¬ 
cit  method  and  corrected  implicit  method  are  identical  on  a  sequential  processor,  then  it  is  easy  to  see  that 
the  parallel  corrected  implicit  method  will  be  more  efficient  than  the  parallel  explicit  method  on  a  parallel 
multiprocessor,  especially  on  a  local  memory  multiprocessor.  As  we  know,  the  bottle-neck  of  the  explicit 
or  the  implicit  methods  implemented  on  either  type  of  multiprocessor  is  to  (form  the  J  and)  solve  the  bot¬ 
tom  system  of  equations  which  involves  synchronizations  or  data  communications.  If  the  implicit  method 
has  more  inner  iterations  and  fewer  total  iterations  than  the  explicit  method,  then  the  implicit  method  will 
form  f  and  solve  the  bottom  linear  system  of  equations  less  times  than  the  explicit  method.  The  effects  of 
reducing  this  bottle-neck  on  the  parallel  performance  will  be  greater  on  a  local  memory  multiprocessor 
than  a  shared  memory  multiprocessor,  since  the  formation  of  /  may  fully  be  parallelized  on  a  shared 
memory  multiprocessor,  and  since  the  com.m unications  delays  on  a  local  memory  processor  are  usually 
significantly  larger  than  synchronization  delays  on  a  shared  memory  multiprocessor. 

6.  Conclusions  and  future  work 

We  have  studied  three  methods  for  solving  block  bordered  nonlinear  system  of  equations:  explicit, 
implicit  and  corrected  implicit  methods.  The  following  conclusions  are  obtained  from  our  analysis  and 
experiments: 

(1)  The  conected  implicit  method  retains  the  quadratic  convergence  rate  of  the  explicit  method,  and 
appears  to  converge  a  little  faster  in  practice. 

(2)  The  steps  of  the  corrected  implicit  method  are  in  descent  directions  under  some  limited  conditions. 

(3)  Both  the  explicit  method  and  the  implicit  methods  should  get  reasonably  good  speedup  on  a  shared 
memory  multiprocessor.  The  implicit  methods  should  gain  more  if  the  solution  of  J  is  expensive. 

The  next  stage  of  this  research  will  be  to  complete  the  implementations  of  the  parallel  methods  on 
both  a  shared  memory  multiprocessor,  the  Encore  Multimax,  and  a  local  memory  multiprocessor,  the  Intel 
hypercube,  and  to  study  the  performance  of  the  methods  when  /  is  full,  sparse  and  very  sparse.  A  load 
balancing  problem  in  a  local  memory  multiprocessor  may  occur  in  the  applications  when  the  size  of  the 
diagonal  blocks  A,  are  different.  We  also  plan  to  study  this  issue. 

The  methods  we  have  discussed  have  assumed  that  the  Jacobian  matrix  of  the  block  bordered  non¬ 
linear  system  (1.2)  is  available.  However,  in  many  practical  applications,  the  Jacobian  matrix  is  not  given 
by  a  set  of  formulas,  rather  it  is  the  output  from  some  computational  or  experimental  procedure.  In  this 
case,  secant  methods  (such  as  Broyden’s  method)  are  often  used  to  solve  (1.1).  (see  e.g.  Dennis  and  Schna¬ 
bel  [1983]).  We  also  intend  to  develop  a  secant  method  for  solving  block  bordered  systems  of  nonlinear 
equations. 
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Preface 


A  complete  study  of  the  principal  nth  root  of  a  complex  matrix  and  associated 
matrix- valued  functions  is  presented  in  this  research  monograph.  This  includes  the 
development  of  techniques  to  compute  the  principal  nth  root  of  a  matrix,  study  of 
associated  matrix- valued  functions,  and  their  applications  to  mathematical  sciences 
and  control  systems.  First  of  all,  a  computationally  fast  and  numerically  more  stable 
algorithm  has  been  developed  to  compute  the  principal  nth  root  of  a  complex  matrix 
without  explicitly  utilizing  its  eigenvalues  and/or  eigenvectors.  The  principal  nth 
root  of  a  matrix  is  shown  to  be  useful  for  the  following;  constructing  the  matrix-sign 
function  and  the  (generalized)  matrix-sector  function;  solving  the  matrix  Lyapunov 
and  Riccati  equations;  separating  matrix  eigenvalues  relative  to  a  circle,  sector  and  a 
sector  of  a  circle  in  (he  A-plane;  block-diagonaiization  (parallel  decomposition)  and 
block-triangularization  (cascaded  decomposition)  of  a  general  system  matrix;  gen¬ 
eralizing  the  block-partial-fraction  expansion  of  a  rational  matrix;  and  modelling 
a  continuous-time  system  from  the  identified  discrete-time  model.  Also,  in  this 
research  monograph,  new  definitions  and  computational  algorithms  have  been  pre¬ 
sented  to  determine  the  rectangular  and  polar  representations  of  a  complex  matrix. 
Furthermore,  their  applications  to  control  systems  have  been  discussed.  Finally, 
utilizing  the  developed  algorithms,  a  multi-stage  design  procedure  has  been  estab¬ 
lished  to  design  discrete-time  controllers  to  achieve  pole- assignment  in  a  specified 
region  for  a  large-scale  discrete-time  multi\'ariable  system. 
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Chapter  1 
Introduction 


Computationed  methods  for  finding  the  nth  roots  of  some  specific  matrices  have 
been  proposed  in  (1,3, 4, 9]  and  [17|-[22].  Hoskins  and  Wedton  [4],  using  the  Newton- 
Raphson  algorithm,  have  derived  a  fast  and  stable  method  for  computing  the  nth 
roots  of  positive-definite  matrices.  Based  on  a  spectrzd  decomposition  technique  ob¬ 
tained  from  the  matrix-sign  function  (17)  together  with  Hoskins- Walton  algorithm 
(6),  Denman  et  al.  (18,19)  have  proposed  an  algorithm  to  compute  the  nth  roots  of 
real  and  complex  matrices  without  prior  knowledge  of  the  eigenvalues  and  eigenvec¬ 
tors  of  matrices.  However,  in  general,  the  computed  nth  root  of  a  general  matrix 
by  using  the  above  algorithms  is  not  the  principal  nth  root  of  the  matrix.  There 
are  many  applications  of  the  principad  nth  root  method  to  mathematical  sciences 
and  control  systems  such  as  these  listed  below: 

1)  to  construct  the  matrix-sign  function  [9,17],  the  matrix-sector  function  [26,27] 
and  the  generalized  matrix  sector  function,  to  solve  the  matrix  Lyapunov  ana 
Riccati  equations  [1,17,23,24,25), 

2)  to  separate  matrix  eigenvalues  relative  to  a  sector,  circle  amd  a  sector  of  a  circle 
in  the  A-plane, 

3)  to  achieve  A-invariant  space,  the  block-diagonalization  (parallel  decomposition) 
and  block-triangularization  (cascaded  decomposition)  of  the  system  matrix, 

4)  to  generalize  block  partial-fraction  expansion  of  a  rational  matrix  [12,13], 

5)  to  model  a  continuous-time  system  from  the  identified  discrete-time  model, 

6)  to  determine  the  rectangular  and  polar  representations  of  a  complex  matrix, 
and 

7)  to  develop  the  multi-stage  design  procedure  for  designing  discrete-time  con¬ 
trollers  to  achieve  pole-assignment  in  a  specific  region  for  a  large-scale  discrete- 
time  multivariable  system. 

Shieh  et  al.  [20]  first  proposed  an  algorithm  to  compute  the  principal  nth  roots 
of  complex  matrices.  To  improve  the  convergence  rate  of  the  computationed  algo¬ 
rithm  in  (20),  Tsay  et  al.  [21]  derived  a  fast  dgorithm  using  the  matrix  continued- 
fraction  method  to  compute  the  principal  nth  roots  of  complex  matrices.  However, 
the  above  two  algorithms  (20,21J  are  not  numerically  stable.  For  example,  for  an 
ill-conditioned  matrix  such  as  a  stiff  matrix  containing  both  large  and  small  eigenval¬ 
ues,  the  algorithms  in  [20,21]  converge  in  the  first  few  iterations  ^md  then  diverge 
very  quickly.  To  overcome  this  problem  of  numerical  stability,  Higham  [22]  and 
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Shieh  et  aJ.  [29]  have  proposed  fast  and  stable  algorithms,  respectively,  for  comput¬ 
ing  the  principal  square  root  of  a  complex  matrix.  Since  the  algorithms  [22,29]  are 
limited  to  compute  the  principed  square  root  of  a  matrix  only,  we  can  not  apply  the 
algorithms  to  compute  the  principal  nth  root  of  a  complex  matrix  when  n  is  not 
the  power  of  two. 

Since  there  are  so  many  applications  of  the  principal  nth  root  method  to  math¬ 
ematical  sciences  and  control  systems,  a  computation^ly  fast  and  numerically  more 
stable  algorithm  has  been  developed  to  compute  the  principal  nth  root  of  a  matrix 
without  explicity  utilizing  its  eigenvalues  and/or  eigenvectors.  Moreover,  some  ap¬ 
plications  of  the  principal  nth  root  method  to  mathematical  sciences  and  control 
systems  are  presented  in  this  research  monograph. 


The  material  in  this  research  monograph  is  organized  as  follows. 


In  Chapter  2,  based  on  the  generalized  continued-fraction  method  for  finding 
the  nth  roots  of  real  numbers,  a  fast  computational  method  for  finding  the  principal 
nth  root  of  a  complex  matrix  has  been  developed.  Computational  algorithms  with 
high  convergence  rates  are  presented,  and  their  global  convergence  properties  are 
investigated  from  the  viewpoint  of  systems  theory. 


In  Chapter  3,  rapidly  convergent  and  more  stable  recursive  zilgorithms  for  find¬ 
ing  the  principal  nth  root  of  a  complex  matrix  have  been  developed.  The  developed 
algorithms  significantly  improve  the  computational  aspects  of  finding  the  principal 
nth  root  of  a  matrix.  Thus,  the  developed  algorithms  will  enhance  the  capabilities  of 
the  existing  computational  algorithms  such  as  the  principal  nth  root  algorithm,  the 
matrix-sign  algorithm  and  the  matrix-sector  algorithm  for  developing  applications 
to  control-system  problems. 


In  Chapter  4,  the  matrix-sector  function  of  A  has  been  generalized  to  the 
matrix-  sector  function  of  g(.4),  where  the  complex  matrix  A  may  have  a  real  or 
complex  characteristic  polynomial  and  g(.4)  is  a  matrix  function  of  a  conformal 
mapping.  Based  on  the  computationally  fast  and  numerically  more  stable  algo¬ 
rithms  for  computing  the  principal  nth  root  of  a  complex  matrix,  rapidly  conver¬ 
gent  and  more  stable  recursive  algorithms  for  finding  the  matrix-sector  function  and 
the  generalized  matrix-sector  function  have  been  developed.  Moreover,  the  gener¬ 
alized  matrix-sector  function  of  A  is  employed  to  separate  the  matrix  eigenvalues 
relative  to  a  sector,  a  circle,  and  a  sector  of  a  circle  in  a  complex  plane  without  ac¬ 
tually  seeking  the  characteristic  polynomial  and  the  matrix  eigenvalues  themselves. 
Also,  the  generalized  matrix-sector  function  of  A  is  utilized  to  carry  out  the  block- 
diagonalization  and  block-triangularization  of  a  system  matrix,  which  are  useful  in 
developing  applications  to  mathematical  science  and  control-system  problems. 


In  Chapter  5.  fast  computational  methods  are  developed  for  finding  the  equiv¬ 
alent  continuous-time  state  equations  from  discrete-time  state  equations.  The  com¬ 
putational  methods  utilize  the  direct-truncation  method,  the  matrix  continued- 
fraction  method,  and  the  geometric-series  method  in  conjunction  with  the  principal 
nth  root  of  the  discrete-time  system  matrix  for  quick  determination  of  the  approx¬ 
imations  of  a  matrix-logarithm  function.  It  is  shown  that  the  use  of  the  principal 
nth  root  of  a  matrix  enables  us  to  enlarge  the  convergence  region  of  the  expansion 
of  a  matrix  logarithm  function  and  to  improve  the  accuracy  of  the  approximations 
of  the  matrix-logarithm  function. 
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In  Chapter  6,  some  new  definitions  of  the  real  and  imaginary  parts  and  the 
associated  amplitude  and  phase  of  a  real  or  complex  matrix  have  been  defined. 
Computational  methods,  which  utilize  the  properties  of  the  matrix-sign  function 
and  the  principal  nth  root  of  a  complex  matrix,  are  given  for  finding  these  quantities. 
A  geometric-series  method  is  newly  developed  for  finding  the  approximation  of  the 
matrix- valued  function  of  tan~*(A’’),  which  is  the  principal  branch  of  the  arc  tangent 
of  the  matrix  X . 

In  Chapter  7,  a  multi-stage  pseudo-continuous-time  state-space  method  is  de¬ 
veloped  for  designing  a  large-scale  discrete  system,  which  does  not  exhibit  a  two-  or 
multi-time  scale  structure  explicity.  The  designed  pseudo- continuous- time  regulator 
places  the  eigenvalues  of  the  closed-loop  discrete  system  within  the  common  region 
of  a  circle  (concentric  within  the  unit  circle)  and  a  logarithmic  spiral  in  the  complex 
z-plane,  without  explicitly  utilizing  the  open-loop  eigenvalues  of  the  given  system. 
The  proposed  method  requires  the  solution  of  small  order  Riccati  equations  only 
at  each  stage  of  the  design.  The  principal  nth  root  method  has  been  employed  to 
obtain  a  multi-time  scale  structure  for  the  proposed  design  method. 

Conclusions  are  summarized  in  Chapter  8  and  numerical  examples  are  given  at 
the  end  of  each  chapter  to  illustrate  the  concepts  of  the  material  presented  in  that 
respective  chapter. 
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Chapter  2 

A  Fast  Method  for  Computing  the  Principal 
nth  Roots  of  Complex  Matrices 


Based  on  the  generalized  continued-fraction  method  for  finding  the  nth  roots 
of  real  numbers,  this  chapter  presents  a  fast  computational  method  for  finding 
the  principal  nth  roots  of  complex  matrices.  Computational  algorithms  with  high 
convergence  rates  are  developed,  and  their  global  convergence  properties  are  inves¬ 
tigated  from  the  viewpoint  of  systems  theory  [21]. 


2.1  Introduction 

Computational  methods  for  finding  the  nth  root  of  some  specific  matrices 
have  been  proposed  in  [1-7] .  The  matrix-sign  function  method  [1,7],  the  matrix 
continued-fraction  method  [2,5,6],  and  the  Newton- Raphson  method  [3,4]  have  suc¬ 
cessfully  been  used  to  determine  the  square  roots  of  real  and  complex  matrices. 
Applications  of  above  methods  have  been  made  to  solve  systems  problem,  such  as 
the  matrix  Lyapunov  and  Riccati  equations,  spectral  factorization  and  solvents  of 
matrix  polynomials,  etc.  Recently,  Hoskins  and  Walton  [4]  have  proposed  an  accel¬ 
erated,  stable  Newton- Raphson  method  for  computing  the  nth  root  of  a  positive- 
definite  matrix,  whereas  Denman  and  Leyva- Ramos  [7]  have  used  the  extended 
matrix-sign  function  [8],  which  is  a  variant  of  the  Newton- Raphson  method  [9], 
for  finding  the  nth  root  of  a  positive- semi  definite  matrix.  However,  the  exist¬ 
ing  Newton-Raphson  methods  [4,7],  in  general,  cannot  be  applied  to  determine 
the  principal  nth  roots  of  complex  matrices  which  may  be  positive  or  positive- 
semidefinite. 

In  this  chapter,  we  shall  extend  the  generalized  continued-fraction  method  [10], 
which  was  developed  for  determining  the  nth  root  of  a  positive  real  number,  to  find 
the  principal  nth  roots  of  a  complex  number  and  a  complex  matrix.  Also,  we  shall 
establish  a  fast  computational  algorithm  for  determining  the  principal  nth  roots  of 
complex  matrices  which  may  not  be  positive  or  positive  semidefinite.  Moreover,  we 
shall  investigate  the  global  convergence  properties  of  the  proposed  algorithm  from 
the  viewpoint  of  systems  theory. 
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2.2  The  Principal  nth  Roots  of  Complex  Numbers 

The  principal  nth  root  of  a  complex  number  is  defined  as  follows. 

Definition  2.2.1 

Let  a  =  €  C,  where  p,  6  ^  R  and  p  >  0,  6  [ir,  -tt).  The  p^incip^d  nth 

root  of  a  is  defined  as 


(2.1) 

where  the  real  number  ^fp  w'ith  ^  >  0  is  the  principal  nth  root  of  p.  □ 

Based  on  the  generalized  continued  fractions  [10],  a  recursive  algorithm  with 
the  help  of  matrix  operations  has  been  developed  for  finding  the  nth  root  of  a 
positive  real  number  and  associated  fractional  powers  of  the  positive  real  number. 
The  algorithm  is  described  below. 

Consider  a  discrete  state  equation, 

X{k  +  1)  =  HX{k),  A'(0)  =  [1,0,0,...,0,0]^6  (2.2a) 


where 


and 


(2.26) 


H  = 


'1 

a 

a 

...  a 

a ' 

1 

1 

a 

...  a 

a 

1 

1 

1 

...  1 

a 

.1 

1 

1 

...  1 

1. 

(2.2c) 


The  superscript  T  in  (2.2)  denotes  the  transpose  operation  on  a  vector.  When  a  in 
(2.2c)  is  a  positive  real  number,  its  determined  fractional  powers  are 

X '( 

lim  ,  ,  =  (  )■'"’  for  0  <  i,j  <  n  and  fc  >  1.  (2.3) 

k  —  oo  Xj[k) 


The  correctness  of  the  convergence  values  for  the  formiilation  in  (2.3)  has  been 
proved  in  [10]  via  the  continued-fraction  approximation  theory.  In  this  section, 
we  shall  extend  the  results  in  (2.2)  and  (2.3)  to  include  a  complex  number  a  with 
arg(a)  ^  tt  and  a  ^  0,  and  we  shall  investigate  the  convergence  properties  from  the 
viewpoint  of  systems  theory. 

Consider  the  matrix  H  in  (2.2c),  which  is  the  transpose  of  a  fc-circulant  with 
k  =  a  [14,  pp.  84-85]  and  can  be  expressed  by 

/f  =  (D-'f)A(£>-*F)-\  (2.4a) 
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where  the  matrix  A  =  diag[p(a),p(QlV), . . .  ,p(aW'’’"~^ )]  with  a  =  ^,W  = 
and  p{x)  =  x',  the  matrix  D  =  diagfl,a, ...  and  the  Fourier  matrix 

F  [14,  p.  32]  is 


'1 

1 

1 

...  1 

1 

W-^ 

w-^ 

1 

iy-2 

W-4 

\y-2n-h3 

.1 

|^^-2n+2 

(2.46) 


Hence  the  eigenvalues  of  H,  which  are  defined  as  Aj  for  t  =  1, 2, . . . ,  n,  are  p(alV’*~^ ) 
for  i  =  1,2,  -  •  •  ,n,  and  their  associated  eigenvectors  of  H  are  (JD“*F)ej,  where  Cj  is 
the  ith  column  of  It  also  follows  that  the  modal  matrix  of  H ,  denoted  by  M,  is 
D~^F.  Employing  the  similarity  transformation, 

Xik)  =  D-'^FX^k)  =  MXd{k)  (2.5a) 


to  (2.2a)  yields 


Xd{k  +  1)  =  AA'rf(fc), 


A-,(0)  = 


(2.56) 


The  solution  of  (2.56)  is 


A',(fc) 


. pCalV- )*] 


—  [A* 


>2,  ■ 


(2.5c) 


and  the  solution  of  (2.2a)  becomes 


X{k)  =  MX^{k) 


I 

n 


Afn-<'-i)(C/a)-’. 

1=1  ^  ^1=1  ^ 

(^|;  A,*  ( c/i  j  ( ys) 


T  T 


-n+1 


for  k  >  0.  (2.5d) 
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Lemma  2.2.1 

If  arg(a)  7^  tt  and  a  ^  0,  then  Ai  7^  0  and  0  <  |A//Ai|  <  1  for  n  >  I  >  1. 

Proof 

When  i  =  1,  Aj  [=  p(a)]  becomes 

Ai  =  1  +  ^  +  (-C/a)^  H - |-(v/a)” 

If  a  =  1,  then  A]  =  n  7^  0  and  Aj  =  0  for  I  >  1.  Thus,  |A//Ai|  =  0  for  n  >  /  >  1.  On 
the  other  hand,  if  a  ^  1,  then  Ai  =  (1  —  a)/(l  —  •^a)  ^  0  and 

A,  1  -  .^a 
Ai  “  1  -  -C/SW'-i  • 

Let  a  =  and  {/a  =  Thus,  we  have 

^  1  l  +  (C/?)^-2cos^n^ 

Since  a  7^  0,  we  have  ^  >  0  and  so  the  lemma  is  proved  if 

It  follows  that  0  <  |Aj/Ai|^  <  1  and  0  <  iAi/Aj|  <  1  provided  that  0  ^  n.  ■ 

Theorem  2.2.1 

limfc_oo[*t(fc)/®;(fc)]  =  (  for  i  j.  1  <  h  ;  <  ri,  a  7^  0,  and  arg(a)  7^  tt. 

Proof 

From  (2.5</),  we  obtain 
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Xi(k)  _  j=i _ 

X  fife)  ” 

1=1 


Since  Ai  ^  0,  we  have 


gi(A:) 

Xj{k) 


k 

P^r-(/-l){»-l) 


lb 

yy-('-i)(;-i) 


(2.6) 


Using  the  result  in  Lemma  2.2.1  or  0  <  1A|/Ai|  <  1,  we  obtain 


UmM|l 

>f— *oo  Xj[k) 


I 


Corollary  2.2.1 

Having  the  state  equation  defined  in  (2.2),  the  principal  nth  root  of  a  can  be 
found  as 


lim  ^ 

fc  — oo  Xi+i{k} 


=  \/a  for  1  <  t  <  n 


(2.r) 


if  o  ^  0  and  arg(a)  ^  tt.  ■ 

Corollary  2.2.2 

The  pth  power  of  the  principal  nth  root  of  a  can  be  found  as 

lim  — ^  =  ( Va)  for  n  —  1  >  p  >  1.  (2.8) 

fc^ooXp+i(fc)  ^  ^ 
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2.3  Recursive  Algorithms  and  Their  Global  Convergence  Properties 


From  (2.2),  we  can  compute  each  state  Xi{k)  as  follows, 


^nik)  =  Y^xi{k  -  1), 

i=i 


(2.9a) 


Xi(A;)  =  ii+i(fc)  +  (a  —  l)arj(A:  —  1)  for  t  =  n  —  l,n  —  2, . . . ,  1.  (2.96) 


The  algorithm  to  compute  the  pth  power  of  the  principal  nth  root  of  a  complex 
number  a  becomes 


lim 

t-oo  2p+i(fc) 


(2.9c) 


The  direct  use  of  (2.9)  to  compute  (^^a)^  may  result  in  numerical  overflow  if 
the  magnitude  of  any  eigenvalue  of  H  in  (2.2)  is  larger  than  unity.  However,  the 
numerical  difficulty  may  be  overcome  by  normalizing  *i(fc)  in  (2.9)  to  be  unity  for 
all  k. 

To  analyze  the  convergence  rate  of  the  algorithm  in  (2.9),  we  assume 


e  =  max 


{ 


—  2, 3, . . . ,  n^ 


(2.10a) 


and  rewrite  (2.6),  with  i  =  1  and  j  =  2,  as 


xijk) 

X2{k) 


=  y/aA{k), 


where 


(2.106) 


(2.10c) 


Then,  by  using  Lemma  2.2.1  and  assuming  k  is  sufficintly  large,  the  error  ratio 
I  (  y/a  A{k)  —  ^fa )  /  ^/a  \  becomes 


\A{k)-l\< 


2{n  —  1)£* 

1  -  (n  —  1)£* 


(2.10d) 


or 
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|A(«:)-1|  =  0(£‘). 


(2.10e) 


Therefore,  the  zdgorithm  in  (2.9)  has  a  linear  convergence  rate.  The  derivation 
of  the  convergence  in  (2.10)  for  the  algorithm  in  (2.9)  is  similar  to  that  of  the 
Bernoulli- Aitken  method  [15,16],  which  is  the  well-known  power  method  for  finding 
the  largest  read  or  complex  root  of  an  algebraic  equation. 

The  linear  convergence  of  the  algorithm  in  (2.9)  is  not  realistic  for  practical 
computations.  We  shall  now  develop  iternative  ^gorithms  with  higher  order  con¬ 
vergence  rates. 


Lemma  2.3.1 

From  (2.2),  if  the  first  column  of  H'‘  is  defined  as 

hi  =  [hifcj  ^2*, . . . ,  hnfc]  , 


then 


'  hi* 

O'hnk 

®h(„_i)fc 

...  a/l3* 

ah^k 

hj* 

hi* 

O-hnk 

.  .  .  0/14* 

ah^k 

• 

hsfc 

h2fc 

hi* 

• . .  ohsfc 

ahik 

= 

: 

; 

: 

: 

h(n-l)fc 

h(n-2)fc 

^{n-3)k 

...  hi* 

O-hnk 

-  hnfc 

h(n-2)A! 

...  ha* 

hifc 

Proof 

Since  H  satisfies  HrjJ  =  tj][H  [14,  p.  84],  where 


Lemma  2.3.1  follows  immediately,  since 


(2.11) 


Lemma  2.3.2 

The  solution  A’(fc)  of  the  state  equation  in  (2.2)  is  the  first  column  of  if*'  in 

(2.11). 


Proof 

Since  A'(h)  =  HX{k  -  1)  =  H^X{0)  and  A'(0)  =  [1, 0, . . . ,  0]^,  the  first  column 
of  if*  is  X{k).  ■ 
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Theorem  2.3.1 


If  the  solution  of  the  state  equation  in  (2.2)  at  the  A;th  step  is  X{k),  then 

X(2fc)  =  [®i(2fc),*2(2fc),...,Xn(2A)]^  (2.12) 

where 

l  n 

xi{2k)  =  '^Xi{k)xi+i-i{k)  + a  Xi{k)x,^^i+i-i{k)  for  1  <  I  <  n, 
i=l  »=/+! 

n 

®n(2^)  =  ®i(^)®n+l  — 1(^)  fc  ^  1. 

i=l 


Proof 

From  (2.2),  we  have  X(2k)  =  H^X{k).  Using  the  results  in  Lemmas  2.3.1  and 
2.3.2  yields  the  result  in  Theorem  2.3.1.  ■ 

From  Theorem  2.3.1,  we  can  establish  a  quadratic  convergence  algorithm  for 
computing  the  principal  nth  roots  of  complex  numbers. 

Corollary  2.3.1 

The  convergence  rate  of  the  algorithm  in  (2.12)  is  quadratic. 

Proof 

Define  Z{k)  =  [2i(A:),  22(A:), . . .  ,z„(A:)]^  =  A’’(2*”^)  for  fc  >  1  and  Z(0)  =  A’’(0). 
From  the  algorithm  in  (2.12),  we  obtain 

Z{k  +  1)  =  A(2*)  =  F^*A(0)  =  F**Z(0)  =  H,{k)Z{k), 
where  H^ik)  =  '. 

Define  ri(Ar)  =  2i(^)/z2(^)  and  c  =  max{|Ai/Ai  I,  /  =  2,3, . . . ,  n}.  From  (2.6), 
we  have 

r;,(fc)  =  ^A(fc), 

where 
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Similar  to  the  derivation  in  (2.10),  the  error  ratio  j  (  ^A(k)  —  v/a)/  ^  |  becomes 


|A(fc)-l|< 


2(n  —  l)c^* 

1 


and 


|A(i  +  1)  -  1|  =  0(|A(Ar)  -  1|»)  =  0(e^"'') 


for  large  k. 

Therefore,  the  algorithm  in  (2.12)  converges  quadratically. 


Algorithms  with  higher  order  convergence  rates  are  established  below. 


Theorem  2.3.2 


Define  Z(k)  =  X{q'‘  ^),  where  g  is  a  positive  integer  with  q  >  2  and  fc  >  1. 
Also,  define  a  state  equation. 


Z{k  +  1)  =  Hj-\k)Z{k),  Z(0)  =  A'(0),  (2,13a) 


where  Ht{k)  =  .ff’*  *  for  it  >  1  and  Hz[k)  has  the  same  structure  as  in  (2.11), 
having  the  first  column  [2i(fc),Z2(fc),...,2n(fc)]^.  Then,  the  edgorithm  for  finding 
the  pth  power  of  the  principal  nth  root  of  a. 


lim 

k-^OO 


Zjjk) 

Zj{k) 


(2.136) 


where  p  =  j  —  i,  has  gth-order  convergence  rate. 


Proof 

Theorem  2.3.2  can  be  proven  in  a  manner  similar  to  Theorem  2.3.1  and  Corol¬ 
lary  2.3.1.  ■ 


2.4  The  Principal  nth  Roots  of  Complex  Matrices 

The  methods  described  in  Sections  2.2  and  2.3  for  computing  the  principal  nth 
roots  of  complex  numbers  can  be  extended  to  compute  the  principal  nth  roots  of 
complex  matrices.  The  principal  nth  root  of  a  complex  matrix  is  defined  below. 
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Definition  2.4.1 

Let  A  €  (t{A)  =  {A,,  t  =  1,2, Aj  5^  0  and  arg(Ai)  ^  ir.  The 

principal  nth  root  of  A  is  defined  as  VA  €  where  n  is  a  positive  integer  and 

(a)  {VAy  =  A, 


(b)  each  eigenvalue  of  VA  is  the  principal  nth  root  of  each  Aj. 


□ 


To  derive  a  fast  algorithm  for  computing  the  principal  nth  roots  of  complex 
matrices,  we  extend  the  discrete-state  equation  in  (2.2)  to  the  6/oc-discrete-state 
equation  as  follows, 

X{k  +  1)  =  GX(fc),  A'(0)  =  (2.14a) 


where  the  matrix  G  is  the  transpose  of  a  block-fc-circulant  [14],  viz., 


[Tm 

A 

A 

A 

A  ■ 

Im 

A 

A 

A 

U 

Im 

Im 

A 

A 

Im 

Im 

Im 

Im 

A 

ilm 

Im 

Im 

Im 

Im. 

6  C 


nmxnm 


(2.146) 


and 


X{k)  =  [i  J’(fc),  (*),.. 


xl{k)f 


(2.14c) 


Note  that  the  state  variables  Xi{k]  in  (2.26)  are  of  dimension  1x1,  whereas  the 
block-state  variables  Xi(k)  in  (2.14c)  are  of  dimension  m  x  m.  The  characteristic 
polynomial  matrix  (14|  of  G  can  be  determined  as 


D(X)  =  X'U  -  -A)-  -  A)’  +  ■  ■  ■ 

+  A(/„  -  A)"-"  +  -  A)"-',  (2.15a) 


where  D{X)  6  C”*’‘"*[A],  X  €  C,  and  „Ci  are  the  coefficients  of  a  binomial  expansion. 

The  block  eigenvalues  of  G,  which  are  also  known  as  the  solvents  [11,12]  of 
D{X),  can  be  obtained  from  2?(A)  in  (2.15a)  as 

D(Ai)  =  Om  for  i  =  1,2, . . .  ,n,  (2.156) 


where  Aj  G  are  the  block  eigenvalues  of  G. 

Let  a  set  of  complete  solvents  [11-13]  of  D(A)  be 
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\,  =  J^lVAW‘-^y  ^  for  l<l<n,  (2.16o) 

t=i 


where  W  =  Then  the  block  eigenvector  [11]  associated  with  A/  becomes 


Vi  = 


^  [ 


/„,((can"-)-)  can"-) 


€  C”"***”*,  (2.166) 


and  the  corresponding  block-modal  matrix  [11]  is 


nmxnm 


(2.16c) 


Thus,  following  the  derivations  in  (2.4)  and  (2.5)  and  employing  the  properties 
of  the  block-fc'circulant  [14]  in  (2.146),  the  block-state  equation  in  (2.14a)  can  be 
transformed  into  a  block-diagonalized  state  equation  by  using  the  following  trans¬ 
formation  [11], 


The  transformed  bloch-rtr.te 


A'(jfc)  =  MXa{k). 

ccuctior:  hecorr.r'.s 


(2.17) 


_  1 

A',f(0)  =  j/—  [■fm>'fm>'-’»7m] 

^71 


(2.18a) 

(2.186), 


where 


Gd  =  ’  GM  =  block  diag[Ai ,  Aj, . . . ,  An]- 


(2.18c) 


The  solution  of  the  block-state  equation  in  (2.14c)  is 


f,(*)=  -(v'I)''"'y  Afu-~C-I)ri-1)  for  i  =  l,2,...,n.  [2.\%d) 

71 


/=i 


Thus,  we  have  the  following  result. 
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Theorem  2.4.1 

Let  A  €  <t(A)  =  {Aj,  i  =  1,2, ...,Tn},  Aj  ^  0,  and  arg(Aj)  ^  tt.  Then, 

lim  =  (\/a)^  *  for  i  >  1  and  j  <  n.  (2.19) 

k'^OO 


Proof 


Since  Aj  and  y/~A  or  (  Va)  commute,  (2.18)  becomes 

xi{k)x-\k)  =  {VaY~"  ^ 

'•/=i  ^1=1  ' 


-1 


Let  7,-  for  1  <  t  <  m  be  the  eigenvalues  of  A/Aj  I  =  2,3,...,n.  Then,  from 
(2.16a),  we  obtain 


7i 


j=i 


E(^.) 


;=i 


for  i  =  1,2, . . .  ,m. 


From  Lemma  2.2.1,  we  have  0  <  |7j|  <  1  if  Aj  0  and  arg(At)  #  tt,  1  <  t  <  m,  and 
so 

lim  Af  (Af^)*  =  lim  (A,Af^)*  =  0  if  1^1. 

As’— ^oo  Aj— *00 

Thus,  we  have 


lim  Xi{k)x~^{k)  =  {^VaY  '. 
k—»<x>  ^ 


Corollary  2.4.1 

The  principal  nth  root  of  a  complex  matrix  A  €  with  <r{A)  =  (Aj,  i  = 

1,2,...  ,n}.  A,-  ^  0  and  arg(A,)  ^  tt,  can  be  found  as 

lim  i,(fc)x,^^j(fc)  =  vCi  for  1  <  i  <  n. 


Corollary  2.4.2 

Given  a  complex  matrix  A  €  as  defined  in  Corollary  2.4.1,  the  principal 

nth  root  of  A  is  unique.  ■ 
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Corollary  2.4.3 

The  pth  power  of  the  principal  nth  root  of  a  complex  matrix  A  as  defined  in 
Corollary  2.4.1  can  be  obtained  as 

lim  for  0  <  p  <  n. 

k—»oo  ^ 


Following  Theorems  2.4.1,  2.3.1  and  Corollary  2.4.3,  we  can  construct  a  quadratic 
convergence  algorithm  for  computing  the  pth  power  of  the  principal  nth  root  of  a 
complex  matrix  A  as  follows. 
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Algorithm  2.4.1 
Given: 

A  =  a.n  m  K  m  complex  matrix  with  eigenvalues  Aj  =  ,  where  pi  ^  0  and 

9i  ^  IT  for  t  =  1,3, . . .  ,Tn, 
n  =  root  index, 

6  =  error  tolerance. 

Find: 

=  pth  power  of  the  principal  nth  root  of  a  for  1  <  p  <  n  —  1. 

Algorithm: 

{Initialization} 

for  i  ;=  1  to  n  do  {Initialize  the  states  Xj,  i  =  l,2,...,n} 

... 

R  :=  O,,,;  {Initialize  the  principal  nth  roots} 

{Computation  of  the  principal  nth  roots} 

repeat 

for  i  :=  1  to  n  do  {Copy  Xi  to  l'*j  for  t  =  1,2, .. .  ,n} 

Vi  :=  Xi-, 

for  t  :=  1  to  n  do  {Compute  Xi,  i  =  1,2,. . .  ,n} 
begin 

:=  AM'i; 

for  j  :=  2  to  n  do 
if  j  <  t  then 

Xi  :=  +  Xi 

else 

Xi  :=  Ar;ln_,+i+,+Xi 

end; 

R\  :=  {Find  the  princip^d  nth  root} 

A  :=  ||iZ  —  iZjjll;  {The  norm  of  difference  between  the  last  and  current 
iteration  of  the  principal  nth  root} 
if  A  >  ^  then  {Error  is  not  within  the  specified 
tolerance} 

begin 

R  ;=  R\;  {Copy  R^  to  R} 

for  i  :=  2  to  n  do  {Norraadization} 

Xi  ;=  XiXj-'; 

Ai  :=  I,n 


end 

until  X  <  S; 


When  arg(Ai)  =  tt,  Algorithm  2.4.1  cannot  directly  be  applied  to  compute 
(  V~A  Y •  The  matrix  A  can  be  rotated  by  a  small  angle  to  give  A  =  Ae~^^^  (where 
A(3  is  a  small  positive  real  angle)  so  that  [VaY  = 
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Example  2.4.1 

Given  a  complex  matrix, 


'-1.25  +  >3.25 

2.50  +  >3.50 

-2.75  -  >3.25 

6.25  4- >0.75' 

4.00  +  >1.75 

6.00  ->1.50 

-6.50  +  >2.75 

6.00  -  >7.75 

-2.25  +  >1.75 

-0.50  +>2.50 

0.25  -  >2.75 

3.25  +  >2.25 

-2.00  -  >1.50 

-3.00  -  >1.00 

4.00  +  >0.50 

-4.00  +>1.50 

it  is  desired’*to  find  V~A  with  n  =  5. 

The  matrix  A  has  an  eigenvalue  —1  with  arg(  — 1)  =  n.  Thus,  Algorithm  2.4.1 
can  not  be  directly  used  for  finding  Va.  The  matrix  A  is  modified  with  the  rotation 
angle  A/3  =  5°  (or  k/36).  The  modified  matrix  A  becomes 


-0.96199  +  >3.34658 
4.13730  +  >1.39472 
-2.08892  +  >1.93944 
-2.12312  -  >1.31998 


2.79553  +  >3.26879 
5.84643  -  >2.01723 
-0.28021  +  J2.53406 
-3.07574  -  >0.73473 


-3.02279  -  >2.99795 
-6.23559  +  >3.30605 
0.00937  -  >2.76132 
4.02836  +  >0.14947 


6.29158  +>0.20242' 
5.30171  -  >8.24344 
3.43373  +  >1.95818 
-3.85408  +  >1.84292 


Using  Algorithm  2.4.1  with  error  tolerance  10  we  have  the  principal  5th  root  A 
with  10  iterations  as 


1.23828  +>0.78835 
0.63025  -  >0.40270 
0.18516  +  >0.32017 
-0.23397  +  >0.10540 


0.70104  -  >0.13515 
1.61288  -  >0.72001 
0.31079  -  >0.29380 
-0.39025  -  >0.15865 


-0.63288  +  >0.13618 
-0.37929  +  jl.37819 
0.69469  +  >0.48856 
0.50842  -  >0.22120 


0.49772  -  >0.83723' 
-0.15461  -  >1.93558 
0.06699  -  >0.98547 
0.38842  +  >0.42534 


Thus,  the  desired  principal  5th 


root  of  A  is  given  by 


■  1.22433  J- ;0.80985 
0.63718  -  >0.39163 
0.17954  +  jO.32336 
-0.23577  +  >0.10130 


0.70330  -  >0.12290 
1.62521  -  >0.69175 
0.31587  -  jO.28833 
-0.38742  -  >0.16543 


-0.63516  4-  >0.12512 
-0.40329  +  >1.37136 
0.68606  +  ;0. 50061 
0.51220  -  >0.21229 


0.51226  -  jO. 82841  ' 
-0.12081  -  jl. 93799 
0.08418  -  >0.98415 
0.38094  +  >0.43205 


It  is  interesting  to  note  that  the  eigenvalues  of  ,4  are  -l,2  +  jT,l  —  jl,  — 1  +  >0.5. 
whereas  the  eigenvalues  of  Va  are  0.80902  +  >0.58779,  1.0586  —  >0.16766,  1.1696  + 
>0.10877,  0.87937  +  >0.52186.  All  eigenvalues  of  lie  in  (  — 7r/5,  7r/5]. 
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Example  2.4.2 

Given  a  complex  matrix, 


1.20  -  jO.lO 
-0.70  +  >0.10 
1.10  +>1.20 
0.80  +  >0.10 


0.60  ->1.30 
0.40  -  >1.20 
1.05  -  >1.15 
-0.35  -  >0.45 


-1.35  +>0.05 
0.35 +  >0.45 
-1.55  -  >1.10 
-0.40  -  j0.05 


-0.95  -  >0.65  ■ 
-0.55  -  >0.35 
-1.10 ->1.95  ’ 

-0.55  -  >0.60 


it  is  desired  to  find  \^. 

The  matrix  >1  has  an  eigenvalues  1.0  and  a  Jordan  chain  of  length  3  with  an 
eigenvalue  -0.5-jl.0.  The  Jordan  form  of  A  can  be  found  as 


1.0 

0.0 

0.0 

0.0 

0.0 

-0.5  -  >1.0 

1.0 

0.0 

0.0 

0.0 

-0.5  -  >1.0 

1.0 

0.0 

0.0 

0.0 

-05  - 

Using  Algorithm  2.2.1  with  error  tolerance  10“^®,  we  have  the  principal  5th  root  of 
A  with  7  iterations  as  follows, 


V  A  = 


0.85676  +  >0.00206 
0.17671  -  >0.00329 
0.26275  +  >0.28943 
0.10531  -t-  >0.14969 


0.23415  +  >0.02042 
1.11192  -  >0.21818 
0.26571  +>0.16929 
0.2219  -  jO. 12750 


-0.04496  -  >0.29987 
-0.00296  -  >0.00354 
0.98433  -  >0.64590 
-0.05265  -  ;0.07484 


0.10000  -  >0.02759" 
0.06362  -  >0.14713 
0.34128  -  >0.10743 
0.86423  -  jO. 35203 


It  is  interesting  to  note  that  the  eigenvalues  of  Va  are  1,  0.93908-j0. 40468,  0.93908- 

jO. 40468  and  0.93908-j0. 40468.  All  eigenvalues  of  -^A  are  lying  within  — tt/S  and 
+  7r/5. 

This  example  demonstrates  that  Algorithm  2.4.1  can  be  equally  be  used  to  find 
the  principal  nth  roots  of  complex  matrices  having  eigenvalues  unity  and/or  Jordan 
chains  with  length  greater  than  unity. 

2.5  Conclusion 

The  generalized  continued-fraction  method  developed  for  finding  the  nth  roots 
of  real  numbers  has  been  extended  to  determine  the  principal  nth  roots  of  complex 
matrices.  Computational  algorithms  with  high  order  convergence  rates  have  been 
established  for  determination  of  the  principal  nth  root  and  associated  pth  power 
of  the  principal  nth  root  of  a  complex  matrix.  The  global  convergence  properties 
of  the  proposed  algorithms  have  been  investigated  from  the  viewpoint  of  systems 
theory. 


19 


793 


Chapter  3 


Fast  and  Stable  Algorithms  for  Computing  the  Principal 

nth  Root  of  a  Complex  Matrix 


This  chapter  presents  rapidly  convergent  and  more  stable  recursive  algorithms 
for  finding  the  principal  nth  root  of  a  complex  matrix.  The  developed  algorithms 
significantly  improve  the  computational  aspects  of  finding  the  principal  nth  root 
of  a  matrix.  Thus,  the  developed  algorithms  will  enhance  the  capabilities  of  the 
existing  computational  algorithms  such  as  the  principal  nth  root  algorithm,  the 
matrix-sign  algorithm  and  the  matrix-sector  algorithm  for  developing  applications 
to  control-system  problems  [61]. 


3.1  Introduction 

Computational  methods  for  finding  the  nth  roots  of  some  specific  matrices  have 
been  proposed  in  [1,3, 4,9]  and  [l7]-[22].  Hoskins  and  Walton  [4],  using  the  Newton- 
Raphson  algorithm,  have  derived  a  fast  and  stable  method  for  computing  the  nth 
roots  of  positive-definite  matrices.  Based  on  a  spectral-decomposition  technique  ob¬ 
tained  from  the  matrix-sign  function  [17]  together  with  Hoskins- Walton  algorithm 
[4],  Denman  et  aj.  [18,19]  have  proposed  an  algorithm  to  compute  the  nth  roots  of 
real  and  complex  matrices  without  prior  knowledge  of  the  eigenvaJues  and  eigenvec¬ 
tors  of  matrices.  However,  in  general,  the  computed  nth  root  of  a  general  matrix 
by  using  the  above  algorithms  is  not  the  principal  nth  root  of  the  matrix.  The 
principal  nth  root  of  a  matrix  can  be  utilized  to  construct  the  matrix-sign  function 
[9,17]  and  the  matrix-sector  function  [26,27],  to  solve  the  matrix  Lyapunov  and 
Riccati  equations  [1,17,23,24,25]  and  to  approximate  some  matrix- valued  functions 
[28]  etc.  Shieh  et  ai.  [20]  first  proposed  an  algorithm  to  compute  the  principal 
nth  roots  of  complex  matrices.  To  improve  the  convergence  rate  of  the  computa¬ 
tional  algorithm  in  [20j,  Tsay  et  ai.  [21]  derived  a  fast  algorithm  using  the  matrix 
continued-fraction  metnod  to  compute  the  principal  nth  roots  of  complex  matrices. 
However,  the  above  two  algorithms  [20,21]  are  not  numerically  stable.  For  exam¬ 
ple,  for  an  ill-conditioned  matrix  such  as  a  stiff  matrix  containing  both  large  and 
small  eigenvalues,  the  algorithms  in  [20,21]  converge  in  the  first  few  iterations  and 
then  diverge  very  quickly.  To  overcome  this  problem  of  numerical  stability,  Higham 
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[22]  and  Shieh  et  al.  [29]  have  proposed  fast  and  stable  algorithms,  respectively, 
for  computing  the  principal  square  root  of  a  complex  matrix.  Since  the  algorithms 
[22,29]  are  limited  to  compute  the  principal  square  root  of  a  matrix  only,  we  can 
not  apply  the  algorithms  to  compute  the  principal  nth  root  of  a  complex  matrix 
when  n  is  not  the  power  of  two.  In  this  chapter,  we  generalize  the  fast  and  stable 
algorithm  in  [29]  to  compute  the  principal  nth  root  of  a  complex  matrix  and  then 
extend  the  algorithm  to  compute  the  matrix-sector  function. 

This  chapter  is  organized  as  follows:  In  Section  3.2,  we  summarize  the  fast 
zdgorithm  for  finding  the  principal  nth  root  of  a  matrix.  Next,  fast  and  stable 
recursive  algorithms  for  finding  the  principal  nth  root  of  a  matrix  are  developed 
in  Section  3.3.  An  illustrative  example  is  given  in  Section  3.4,  and  the  results  are 
summarized  in  Section  3.5. 

3.2  Surnmary  of  the  Fast  Algorithm  for  Finding  the  Principal  nth  Root 

of  a  Matrix 

The  fast  algorithm  [21]  which  was  derived  via  the  matrix  continued-fraction 
method  for  finding  the  principal  nth  root  of  a  complex  matrix  is  summarized  below. 

Consider  a  block-discrete-state  equation  as 

.Y(fc  +  1)  =  H^-^{k)X{k),  .Y(0)  = 


for  k  =  0, 1, 2,  •  •  • . 


{3.1a) 


Then,  we  have 


lim  Xi{k)X-Mk) 

fc— *00 

=  \''A  for 

n  >2  and  i 

6[l,n-l]  (3.16) 

where  denotes  the  identity  matrix  of  dimension  m  x  m,  and  ffik) 

the  tranpose  of  a  block- if-circulant  matrix  with  K  —  A  [21],  viz., 

€  Cnmxnm  jg 

■  A'i(fc) 

AXn{k) 

•AA*n-l(^) 

...  AX^ik) 

AA'j(fc)- 

X2{k) 

A'i(fc) 

AX\{k) 

...  AX,{k) 

AXjik) 

H{k)  = 

A'3(fc) 

X2{k) 

Ai(fc) 

...  AXs{k) 

AX,{k) 

g  QTtTnXTtrri 

A„-,(fc) 

Xr.-2(k) 

Xn-,{k) 

...  A',(fc) 

AXr^ik) 

.  A'^fc) 

Xn-l{k) 

Xn-2{k) 

...  Azlfc) 

-Yi(fc)  . 

(3.1c) 

X{k)  =  \Xf(k),Xl(k),---,Xl{k)f  e  C"™’"".  (3.W) 


A', (A:)  €  C”‘’^^,fori  =  1, 2,  •••,  n,  are  block  elements,  and  r(>  2)  is  the  convergence 
rate  of  the  algorithm  in  (3.1).  Note  that  A‘i(A:)  for  i  =  l,2,---,n  commutes  with 
itself  and  with  A. 
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The  solution  X(k)  of  the  block-state  equation  in  (3.1)  is  the  first  block  column 
of  If(k)  in  (3.1c).  By  taking  the  advantage  of  the  if-circulant  matrix,  the  algorithm 
with  the  quadratic  convergence  rate  (r  =  2)  for  computing  the  principal  nth  root 
of  a  complex  matrix  is  given  below. 


Theorem  3.2.1  [21] 

The  solution  of  the  block-state  equation  in  (3.1)  with  the  quadratic  convergence 
rate  (r  =  2)  at  the  ktb  step  is  A'(Ar),  then  we  have 

X(l!)  =  (3.2a) 


where 

i  n 

A-,(t)  =  ^  Xi(k  -  1)A', -  1)  +  A  Xi{k  -  l)A'„-i+,+,(t  -  1) 

i=l  i=l+l 


for  1  <  f  <  n  —  1, 


(3.26) 


Xr,{k)  =  Ai(A:  -  l).Yn+i-<(fc  -  1)  for  fc  >  1.  (3.2c) 

t=i 


Also,  we  obtain 

lim  Xi{k)Xj'^{k)  =  (i7^)^“*  for  t  >  1  and  j  <  n  (3.3a) 

lit— *00  ^ 

and 


lim  Xi{k)X-,\{k)  =  \/I  for  1  <  i  <  n  -  1.  (3.36) 

k  —  oo 

The  principal  nth  root  of  a  matrix  is  unique.  □ 

When  the  matrix  A  consists  of  any  negative  real  eigenvalue  (i.e.,  any  arg 
(©■(A))  =  tt),  the  algorithm  in  [21]  cannot  be  directly  applied  to  compute  \/A. 
The  matrix  A  can  be  rotated  by  a  small  angle  to  give  A  =  (where  6(3  is  a 

small  positive  real  angle)  so  that  ( v^)^  =  ( 
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3.3  Fast  and  Stable  Algorithms  for  Finding  the  Principal  nth  Root  of 
a  Matrix 

The  purpose  of  this  chapter  is  to  generzdize  the  fast  and  stable  algorithm  in 
[29  for  computing  the  principal  square  root  of  a  complex  matrix  to  the  fast  and 
stable  algorithm  for  computing  the  principal  nth  root  of  a  complex  matrix. 

Premultipljnng  both  sides  of  the  block-state  equation  in  (3.1a)  with  a  matrix, 
block  diag  (A'-’-(fc),AV"(fc),---,.Yf"(ib)]€  and  defining  Xj-^(k)Xi(k  -h 

1)  =  Xi(k  -I-  1)  for  i  =  1,2,  •  •  •  ,n  and  Xi(k)Xf^(k)  =  R(k),  we  obtain  the  normal¬ 
ized  equivalent  block-state  equation  in  (3.1a)  as 


A(ib  +  1)  =  ff^-^(k)X(k), 


(3.4a) 


where 


Hik)  = 


Im 

AR-'^^'^{k) 

AR-^^^{k)  ... 

AR-^(k) 

AR-^(k) 

R-\k) 

Im 

Ail-"+’(ifc)  ... 

AR-^(k) 

AR-^(k) 

R-\k) 

R-\k) 

Im 

AR-^(k) 

AR-\k) 

R-^^\k) 

R-^^\k) 

R-^^*{k)  ... 

Im 

AR-^-^\k) 

R-^-^^k) 

R-^^\k)  ... 

R-^(k) 

Im 

where  ff(k)  € 

A(fc  +  1)=  [i'fik  l),A'’'(ib  -h  l),---,A'J(fc  +  1)]^  € 

X(k)r=  ll„,(R-\k)f,(R-^(k)f,---,(R-^^^(k))Y^ 

with  R(k  -I-  1)  =  Xi(k  l)Xf^(k  -f  1),  and  lim  R(k)  =  v^. 

fc-*oo 


(3.4&) 

(3.4c) 

(3.4d) 

(3.4e) 


A  recursive  form  can  be  obtained  from  (3.4a)  by  using  the  following  definitions, 

}'(it)  =  ^^-’(ik)A(A:),  (3.5a) 

y}_i(A:)  =  iy>-^(/t)A(fc),  (3.56) 

where 

=  [5  (R-\k)r2j(k)f,  (R-^(k)nj(k)f, . . . ,  (iZ-”-^‘(fc)i;,,(fc))^l 

(3.5c) 
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the  block  vector  Yj{k)  €  C’”"”'”*  and  the  block  elements  Yi^j{k)  €  for  t  = 

1,2,  •  •  •  ,n.  Note  that  the  subscript  j  in  (3.5)  denotes  the  index  of  the  convergence 
rate.  Then,  from  (3.5),  we  obtain  the  following  recursive  algorithm  for  j  =  2, 3,  •  •  • ,  r 
and  any  k, 

Yi[k)  =  T{k)Yi.,{k\  Yi,^{k)  =  I^  for  i  =  l,2,...,n,  (3.5d) 


where 


r/,„  AR-^{k)  AR-^{k) 


T(fc)  = 


AR-^{k) 

Im 


AR-^{k) 

AR-^(k) 

AR-^ik) 

AR-^{k) 


e  c 


nmxnm 


(3.5c) 


Substituting  A'’i(fc  +  1)  =  Y^^rik)  and  A*2(fc  +  1)  =  R~^{k)Y2,r{k)  into  (3.5c),  we 
have 


R{k  +  1)  =  R{k)Y,-;{k)Yi,r{f^)  .  /?(0)  =  /m  for  k  =  0,l,2,....  (3.6a) 

Note  that  iZ(fc),  Y^{^^{k)  and  Yi,r{k)  commute  with  each  other.  Let  us  define  G{k)  = 
AR~^{k),  and  then  frorti  (3.6a),  we  obtain  the  following  equation, 

G(ib  +  1)  =  G(ifc)[l2,r(fc)ri7r(ifc)]”,  G(0)  = /I  for  Jfe  =  0,1,2,---.  (3.66) 


Expanding  the  matrix  equation  in  (3.5<i),  we  have 

+  G(fc)[y2,(i-i)(fc)  +  y3,(j-i)(fc)  H —  + 

-  Yixj-i){k)+Y'2,u-i){k)  +  G{k)[Y3^(j.i){k)+Yi^(j_i){k)-\ - +  y„,(j_i)(fc)] , 


Y{n-I)j{ki)  =  y'i,(;-i)(*:)  +  y2,(;-i)(*:)  + - 1-  Y{n-l),{j-l){k)  +  G{k)Yn,{j-i){k), 


Ynj{k)  =  yi.o-,)(fc)  +  y2.o-i)(fc)  +  •  •  •  +  yn.(,-i)(fc),  (3.7a) 
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or,  in  general  form, 

t  n 

=  E  n.o-.)(*) + o(k)  Y,  n, (/-.)(*), 

P=1  /=i+l 


for  j  =  2, 3,  •  •  •  ,r,  t  =  1,2,  •  •  •  ,n,  and  k  =  0, 1,2,  •  •  • . 


{3.7b) 


Combining  the  algorithms  in  (3.6a),  (3.66)  and  (3.76),  we  obtain  the  desired  algo¬ 
rithm  for  «  =  0, 1, 2,  •  •  •  as  follows, 

P=i 

for  j  =  2,3, •  •  •  ,r,  and  t  =  1,2, •  ■  •  ,n,  (3.8a) 

G{k  +  1)  =  G(fc)[Vj,,(fc)ri;;(fc)]’*,  G(0)  =  A,  li^Gik)  =  /,n,  (3.86) 

R{k  -fl)  =  R{k)Yf^\k)Yi,r{k),  R{0)  =  Im,  lim  R{k)  =  v/I,  (3.8c) 

’  fc— *oo 

where  n  denotes  the  index  of  the  nth  root  of  a  matrix  and  r  is  the  order  of  the 
desired  convergence  rate.  Let  r=2  and  3  in  (3.8),  respectively,  we  obtain  the  nth 
root  algorithms  as  shown  below. 

When  r  =  2,  (3.8)  becomes 

G{k  +  1)  =  G(fc){  l2Im  +  {n-  2)G(*:)]  [/„,  +  (n  -  1)G(A:)] 


G(0)  =  A,  lim  G{k)  =  Im, 

fc— *00 


(3.9a) 


R{k  -h  1)  =  R{k)l2I^  +  {n-  2)G(fc)]  '  [/„,  +  (n  -  1)G(A:)], 


/Z(0)  =  Im,  lim  R{k)  =  </A. 


it— *oo 


(3.96) 


When  r  =  3,  we  obtain 
G{k  +  l)  =  G{k)[[3I„,  +  { 


+  5n  —  12 


)G(k)  +  {- - ?^)G'(«:)]x 


r_  /’i^  +  3n-4  n^  -  3n  4- 2 

[^m  +  ( - r - )G{k)  +  ( - - - 
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(3.9c) 


C(0)  =  A,  Urn  G(k)  =  /„, 

*— *00 

«(*  +  1 )  =  fl(k) [3U  +  ( -^"=23 )G(t)  +  {  )G’(ifc)] '■  X 

[/„  +  (2l±|L^)G(*)  +  (nlz32±2)G3(t)], 

R(0)  =  Im.  lim  R{k)  =  (J.9d) 


Now,  we  list  some  commonly  used  pairs  as  shown  below. 

When  r  =  2  and  n  =  2,  we  have 

G(k  +  1)  =  G(k)l[2lm}llm  +  G(k)]-^ G(0)  =  A,  (3  10a) 

R(k  +  l)  =  Ji(k)!2lmr^l/^-hG(k)],  R(0)  =  I,n,  (3  1Qi) 

lim  R(k)  =  y/A.  (3.10c) 

k—oo 

When  r  =  2  and  n  =  3,  we  have 

G(k  +  1)  =  G{k)[[2U  +  G(A:)l(/„,  +  2G{k)]-^  }',  G(0)  =  A,  (3  11a) 

R{k  +  1)  =  R{k)[2l^  +  G(A;)]-»{/,„  +  2G{k)],  R{0)  =  (3  life) 

lim  R{k)  =  'XA.  (3  11c) 

fc— »oo 

When  r  =  2  and  n  =  4,  we  have 

G{k  4- 1)  =  G(fc){[2/,„  +  2G{k)][Im  +  3G{k)]-^  }',  G(0)  =  A,  (3  12a) 

R{k  +  1)  =  R{k)[2Im  +  2Gik)]-'^[Im  ^  3C?(A:)],  i2(0)  =  (3  126) 

lim  R{k)  =  "^~A.  (3.12c) 

fc— *00 

When  r  =  3  and  n  =  2,  we  have 

G{k  +  1)  =  G{k)[[3Im  +  G(fr)][/,n  +  3G(fc)]-'  G{0)  =  A,  (3  13a) 
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(3.136) 


R{k  +  1)  =  R{k)[3Im  +  G{k)]-^[I^  +  3G(A:)],  R{0)  =  /„, 

lim  R(k)  =  ‘^~A. 

A:— *00 

When  r  =  3  and  n  =  3,  we  have 

G{k  +  1)  =  G(A:){[3/„  +  6G(fc)][/^  +  7G(fc)  +  G^Cfc)]"^ G(0)  =  A, 

R{k  +  1)  =  i2(A:)[3/„,  +  6G(fc)]-M/m  +  7G(Jfe)  +  G^C*)],  R{0)  =  Jm, 

lim  R{k)  =  nXa. 

k-^OO 

When  r  =  3  and  n  =  4,  we  have 

G(fc  +  1)  =  G(fc){[3/^  +  12G(fc)  +  G\k)][I^  +  12G{k)  +  3G2(fc)]-'} 

G(0)  =  A, 

Rik  +  1)  =  R{k)[ZIm  +  12G(fc)  +  G^{k)]-^[I^  +  12G(jfc)  +  3G*(jb)], 

R{0)  =  /m, 

lim  R{k)  =  ■'/a. 

fc— *00 

Some  other  cases  are  listed  below. 

When  r  =  4  and  n  =  2,  we  have 

G(fc  +  1)  =  G(fc)|[4/„»  +  4G(fc)][/^  +  6G(fc)  +  G2(fc)]-i  }', 

G(0)  =  A, 

R{k  +  1)  =  R{k)[AIm  +  4G(fc)]-'[/^  +  6G{K)  +  G\k)], 

RiO)  =  Im, 
lim  R{k)  = 

k  —  oo 

When  r  =  4  and  n  =  3,  we  have 
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(3.13c) 

(3.14a) 

(3.146) 

(3.14c) 

4 

(3.15a) 

(3.156) 

(3.15c) 


(3.16a) 

(3.166) 

(3.16c) 
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G{k  +  1)  =  G{k)[[4Jrn  +  19G(fc)  +  4G"(ik)][/„  +  16G(Jb)  +  10G"(ib)]-' }’, 

G(0)  =  >1,  (3.17a) 

R{k  +  1)  =  R{k)[4Jm  +  19G(jfc)  +  4G2(iir)]-^  [I^  +  16G(ifc)  +  lOG’(ib)], 

RiO)  =  /m,  (3.176) 

lim  R{k)  =  \f~A.  (3.17c) 

k-^OO 

When  r  =  4  and  n  =  4,  we  have 

G(fc  +  1)  =  G(i){[4/^  +  40G(*)  +  20G\k)]iI^  +  3lG(ifc)  +  3lG"(iir)  +  G^(ife)]-'  }^ 

G(0)  =  A,  (3.18a) 

R{K  +  l)  =  R{k)[4lm  +  40G(ii:)  +  20G2(fc)]-'[/^  +  3lG(fc)  +  31G^(jb)  +  G^fc)], 

RiO)  =  /m,  (3.186) 

lim  R{k)  =  -\?C4.  (3.18c) 


Theorem  3.3.1 

The  principal  nth  root  algorithm  in  (3.8)  with  the  r(>  2)th-order  convergence 
rate  is  numerically  stable  in  the  sense  that  the  perturbations  arising  from  the  round¬ 
off  errors  at  the  fcth  iteration  have  only  a  bounded  effect  on  succeeding  iterates  if 
no  new  round-off  errors  are  introduced  on  succeeding  iterates. 

Proof 

The  convergence  rate  of  the  algorithm  in  (3.8)  is  the  same  as  that  in  (3.1) 
because  the  algorithm  in  (3.8)  is  derived  from  the  algorithm  in  (3.1).  The  numerical 
stability  of  the  algorithm  in  (3.8)  can  be  analyzed  below. 

Consider  the  principal  rath  root  algorithms  in  (3.9),  which  has  quadratic  con¬ 
vergence  rate  (r  =  2).  Our  objective  is  to  show  that  the  algorithm  in  (3.9)  is 
numerically  stable  in  the  sense  that  perturbations  arising  from  the  rounding  errors 
at  the  A;th  iteration  do  not  lead  to  unbounded  perturbations  on  succeeding  iter¬ 
ates.  Let  the  perturbed  models  be  G(Ar)  and  R{k)  and  the  associated  round-off 
errors  be  E{k)  and  F{k),  respectively.  Hence  by  definition,  G(A:)=G(k)-fE(k)  and 

Rik)  =  R{k)  4-  F{k).  Our  purpose  is  to  analyze  how  the  error  matrices  E{k)  and 
F{k)  propogate  at  the  {k  +  l)th  stage.  To  simplify  the  an£ilysis,  we  assume  that 

no  round-off  errors  occur  when  we  compute  G{k  -|- 1)  and  R{k  +  1)  in  the  following 
equations, 
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G(fc  +  1)  =  G(ifc){  [2/„  +  (n  -  2)G(fc)]{J„.  +  (n  -  l)G{k)]-^ (3.19a) 

R{k  +  1)  =  Rik)[2I^  +  (n  -  2)G{fc)]-M/m  +  (n  -  l)G(fc)].  (3.196) 

Substituting  G(k)(=  G(k)  +  E(k))  and  R(k)(=  R(k)  +  F(k))  into  (3.19a)  using  the 
perturbation  formula,  we  have 

{D  +  A)-^  =  D-^  -  D-^  A  D-^  +  o(|(  A  ||),  (3.20) 

where  the  D  and  A  are  matrices  and  o(((  A  |()  is  the  high  order  trivial  term  of 
(II  A  II).  Omitting  the  high  order  trivial  terms  of  E{k)  and  F{k)  results  in 

G{k  +  1)  +  E{k  +  1)  =  [C?(fc)  +  E{k)]{{2Irn  +  {n-  2)G{k)  +  (n  -  2)E{k)}  x 

{\I^  +  (n  -  l)G(fc)]-^  -  [/„,  +  (n  -  l)G(A:)]-^(n  -  l)E{k)[I^  +  {n-  l)G(fc)]-^}}" 

(3.21a) 

=  [G(fc)  +  E{k)]  {  [2Irn  +  (n  -  2)£(fc)]  [/,„  +  {n-  l)G(fc)]  -  [2/„.  +  (n  -  2)G(fc)]  x 

[Im  4-  (n  -  l)G(fc)]  '"(n  -  l)£:(fc)[/„^  +  (n  -  1)G(A;)]  +  (n  -  2)E{k)x 

[Im  +  (n  -  1)G(A:)]  ■*  -  (n  -  2)E{k)[l^  +  (n  -  l)G(fc)]  “'(n  -  l)x 

i:(*)[/m  +  (n-l)G(fc)]~'}".  (3.216) 

When  A:  — ►  oo,  R{k)  — ♦  y/A.  Hence  G(A:)  =  AR~^{k)  Im-  Thus,  (3.216)  becomes 

G{k  +  1)  +  f;(fc  +  1)  =  [/„,  +  £(*)]  -  ^-^^^E{k)  +  (3.21c) 

=  [/^  +  i:(fc)]{/„.-^}",  (3.21d) 

G(A:  +  1)  +  ^(A:  +  1)  =  [/,„  +  i;(A:)]  [/,„  -  £;(A:)]  =  (3.21e) 

Substituting  k  co  to  (3.9a),  we  obtain 

G(fc +  !)  =  /,„.  (3.21/) 

Thus,  from  (3.21c),  we  prove 
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E{k  +  1)  =  Om- 


(3.21i7) 


Similarily,  substituting  G{k)  and  R{k)  into  (3.196),  we  get 

R(k  +  1)  +  F{k  +  1)  =  [i2(;fc)  +  F(ib)]  [/„,  +  (n  -  l)G(ib)  +  {n-  !)£;(*)]  x 

{  [21^  +  (n  -  2)G(fc)]  -  [2Im  +  (n  -  2)G(fc)] (n  -  2)E{k)  [21^  +  {n-  2)G{k)]-^  } 

(3.22a) 

=  [R{k)  +  F{k)]  {  [/„,  +  (n  -  l)G(fc)]  [2Im  +  (n  -  2)G(fc)]  -  [/„  +  (n  -  l)G(fc)]  x 

[2lm  +  (n  -  2)G(fc)]  "'(n  -  2)Eik)[2lm  +  (n  -  2)G(fc)]  + 

(n-  l)£;(ifc)[2/,„  +  (Ti-2)G(jfc)]~'}.  (3.226) 

Subtracting  (146)  from  (276)  and  substituting  G{k)  =  for  fc  — »  oo  into  (3.226), 
we  get 

F(/b  +  1)  =  F{k)  +  R{k)  [~^--^~  ^^ig(<;)  +  ^^ig(/fc)]  (3.22c) 

=  F{k)  +  R{k)^^.  (3.22d) 

n 

The  block-state  equations  in  (3.21^)  and  (3.22d)  with  a  null-system  matrix  and  an 
identity-system  matrix,  respectively,  are  stable  because  the  eigenvalues  of  the  system 
matrices  in  (3.21^)  and  (3.22<f|  are  zeros  and  ones,  respectively.  If  we  make  a  further 
assumption  that  no  new  round-off  errors  are  introduced  at  the  (fc  -I-  2)th  stage  of  the 
iterations,  then  (3.22d)  becomes  F(k+2)  =  F(k-i-l)+R(k-i-l)E(k-^l)/n  =  F(k-f-l). 
This  siiggests  that  the  perturbations  arising  from  the  round-off  errors  at  the  fcth 
iteration  have  only  bounded  effects  on  succeeding  iterates.  Thus,  the  algorithm  in 
(3.9)  is  numerically  stable  provided  that  the  above  assumptions  hold.  In  a  similar 
manner,  we  can  prove  that  the  sdgorithm  in  (3.8)  is  numerically  stable  for  r  >  3.  ■ 

One  of  the  applications  of  the  principal  nth  root  of  a  matrix  is  in  the  deriva¬ 
tion  of  the  matrix-sector  algorithm  which  in  turn  has  many  applications  in  solving 
control-  system  problems  [26,27]. 
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3.4  Illustrative  Example 

Example  3.4.1 

Given  a  stiff  matrix  [22,29], 

■  1  0  0  0  ■ 

-1  0.01  0  0 

-1  -1  100  100 

.-1  -1  -100  100. 

■where  (t[A)  =  {0.01,  1,  100  ±  jlOO},  it  is  desired  to  find  the  >^^4.  The  exact 
solution  is 

0  0  0  ■ 

0.215444  0  0 

-0.013483  5.032481  1.348449  ’ 

-0.048173  -1.348449  5.032481. 

where  each  eigenvalue  of  (={0.215444,  1,  5. 032481  ±jl. 348449})  is  the  princi¬ 
pal  cubic  root  of  each  <r(/l).  Let  us  define  the  absolute  error  ea(fc)  =  ||i2(fc)  — ^fA||oo1 
where  R{k)  is  the  computed  cubic  root  of  A  at  the  fcth  iteration.  For  this  example, 
the  upper  limit  for  the  iteration  index  k  is  taken  as  30. 

Applying  the  algorithm  in  [21]  with  n  =  3  and  r  =  2  in  (3.3),  we  have  the 
result  as  shown  in  Table  3.1.  Vve  find  that  this  algorithm  converges  in  the  usual 
sense  at  fc  =  6  with  the  €aik)  =  4.347  x  10“^;  however,  it  diverges  very  quickly. 
Therefore,  this  algorithm  is  numerically  unstable. 

Applying  the  algorithm  with  n  =  3  and  r  =  2  in  (3.11),  we  have  the  result  as 
shown  in  Table  3.2.  This  algorithm  converges  at  fc  =  6  with  the  ea{k)  =  2.387  x 
10“^®,  then  it  remains  invariant  for  k  >  6.  Employing  the  algorithm  with  n  =  2 
and  r  =  3  in  (3.13),  we  obtain  the  result  as  shown  in  Table  3.3.  This  algorithm 
converges  at  fc  =  5  with  the  ea{k)  =  6.610  x  10“^^,  then  it  remains  invariant  at 
€a(k)  =  6.577x10“^^  for  fc  >  6.  Using  the  adgorithm  with  n  =  3  and  r  =  4  in  (3.17), 
we  have  the  result  as  shown  in  Table  3.4.  This  algorithm  converges  at  fc  =  4  with 

the  ea{k)  =  1.146  x  10“*^.  Also,  the  relative  error,  er(fc)  =  ||iZ(fc)  —  R{k  —  l)||oo5 
remains  invariant  at  1.3877787807814457E  —  16  for  fc  >  5. 

Therefore,  the  algorithms  proposed  in  this  chapter  are  numerically  stable.  Note 
that  a  high  convergence-rate  algorithm  may  not  necessarily  give  the  faster  compu¬ 
tational  time. 

3.5  Conclusion 

Rapidly  convergent  and  more  stable  recursive  algorithms  for  finding  the  princi¬ 
pal  nth  root  of  a  matrix  have  been  developed.  The  developed  recursive  algorithms 
can  be  applied  to  an  ill-conditioned  matrix  containing  large  and  small  eigenvalues. 
By  means  of  a  perturbation  analysis  with  suitable  assumptions,  it  is  shown  that  the 
proposed  recursive  algorithms  are  numerically  more  stable  than  the  algorithms  in 
120,21,26].  The  analysis  of  absolute  numerical  stability  of  the  proposed  algorithms 
has  not  been  done  in  this  chapter.  The  developed  algorithms  will  enhance  the  ca¬ 
pabilities  of  the  existing  computational  algorithms  such  as  the  principal  nth  root 
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algorithm,  the  matrix-sign  algorithm  and  the  matrix-sector  algorithm  which  in  turn 
can  be  applied  to  many  control-system  problems. 
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k 

e{k) 

1 

4.381223792317501 

2 

3.882609786095389 

3 

0.5926630610994657 

4 

0.1829671341223767 

5 

2.3013953289560130E-03 

6 

4.3471 165761532760E-07 

7 

1.7435863254853623E-04 

8 

3.3132393959246582E-02 

9 

1.932622803044509 

10 

1202.571712161634 

11 

248467.2691452321 

12 

11959194.08924662 

13 

8210528857.322972 

14 

1837674003853.891 

15 

111162541995391.2 

16 

3. 90452420571 33462E-b  17 

17 

1.7048772918039683E+21 

18 

2.5431940655482168E+23 

19 

7.6820331894349227E-I-28 

Table  3.1  Error  analysis:  the  second-order  numerically  unstable  algorithm 
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k 

e{k) 

1 

4.381223792317501 

2 

2.424605807720003 

3 

0.2299970454407299 

4 

2.0625460001852391E-04 

5 

1.4119261315670428E-13 

6 

2.3869795029440866E-15 

7 

2.3869795029440866E-15 

8 

2.3869795029440866E-15 

9 

2.3869795029440866E- 1 5 

10 

2.3869795029440866E-15 

11 

2.3869795029440866E-15 

12 

2 .3869795029440866E- 1 5 

13 

2.3869795029440866E-15 

14 

2.3869795029440866E-15 

15 

2.3869795029440866E-15 

16 

2.3869795029440866E-15 

17 

2.3869795029440866E-15 

18 

2.3869795029440866E-15 

19 

2.3869795029440866E-15 

20 

2.3869795029440866E-15 

21 

2.3869795029440866E-15 

22 

2.3869795029440866E-15 

23 

2.3869795029440866E- 1 5 

24 

2.3869795029440866E-15 

25 

2.3869795029440866E-15 

26 

2.3869795029440866E-15 

27 

2.3869795029440866E- 1 5 

28 

2.3869795029440866E-15 

29 

2.3869795029440866E-15 

30 

2.3869795029440866E-15 

Table  3.2 


Error  analysis:  the  second-order  numerically  stable  algorithm 


34 


808 


k 

e{k) 

1 

28.03572760609522 

2 

5.136272016363972 

3 

0.4574895834797120 

4 

6.2643637102149929E-04 

6 

6.5773046731277780E-12 

7 

6.5773046731277780E-12 

8 

6.5773046731277780E-12 

9 

6.5773046731277780E-12 

10 

6.5773046731277780E-12 

11 

6.5773046731277780E-12 

12 

6.5773046731277780E-12 

13 

6.5773046731277780E-12 

14 

6.5773046731277780E-12 

15 

6.5773046731277780E-12 

16 

6.5773046731277780E-12 

17 

6.5773046731277780E-12 

18 

6.5773046731277780E-12 

19 

6.5773046731277780E-12 

20 

6.5773046731277780E-12 

21 

6.5773046731277780E-12 

22 

6.5773046731277780E-12 

23 

6.5773046731277780E-12 

24 

6.5773046731277780E-12 

25 

6.5773046731277780E-12 

26 

6.5773046731277780E-12 

27 

6.5773046731277780E-12 

28 

6.5773046731277780E-12 

29 

6.5773046731277780E-12 

30 

6.5773046731277780E-12 

Table  3.3  Error  analysis:  the  third-order  numerically  stable  algorithm 
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k 

e(i) 

1 

3.882609786095389 

2 

0.732680674532077 

3 

3.8011418337163816E-04 

4 

1.1456591751668466E-11 

5 

1 . 1 45658871 5902383E- 1 1 

6 

1.1456585896976734E-11 

7 

1 .145659001 6944990E- 1 1 

8 

1.1456594136913245E-11 

9 

1.1456598255881501E-11 

10 

1 . 1 456602376849756E- 1 1 

11 

1.1456606496818011E-11 

12 

1.1456610616786267E-11 

13 

1.1456614736754522E-11 

14 

1.1456618856722778E-11 

15 

1.1456622976691033E-11 

16 

1.1456627096659289E-11 

17 

1.1456631216627544E-11 

18 

1 .1456635336595800E-1 1 

19 

1 . 1 456639456564055E- 1 1 

20 

1.1456643576532310E-11 

21 

1 .1456647696500566E- 1 1 

22 

1.1456651816468821E-11 

23 

1 . 1 45665593643 7077E- 1 1 

24 

1 . 1 456660056405332E- 1 1 

25 

1 . 1 4566641 76373588E- 1 1 

26 

1.1456668296341843E-11 

27 

1.1456672416310099E-11 

28 

1.1456676536278354E-11 

29 

1 .1456680656246609E-1 1 

30 

1.1456684776214865E-11 

Table  3.4  Error  analysis:  the  fourth-order  numerically  stable  algorithm 


_ Chapter  4 

Fast  and  Stable  Algorithms  for  Computinj^  the  Generalized  Matrix- 
sector  Function  and  the  Separation  of  Matrix  Eigenvalues 


The  matrix-sector  function  of  A  has  been  generalized  to  the  matrix-sector  func¬ 
tion  of  g(A),  where  the  complex  matrix  A  may  have  a  real  or  complex  characteristic 
polynomial  and  g(^)  is  a  matrix  function  of  a  conformal  mapping.  Based  on  the 
computationally  fast  and  numerically  stable  algorithm  for  computing  the  princi¬ 
pal  nth  root  of  a  complex  matrix,  rapidly  convergent  and  more  stable  recursive 
algorithms  for  finding  the  matrix-sector  function  and  the  generalized  matrix-sector 
function  have  been  developed  in  this  chapter.  Moreover,  the  generalized  matrix- 
sector  function  of  A  is  employed  to  separate  the  matrix  eigenvalues  relative  to  a 
sector,  a  circle,  and  a  sector  of  a  circle  in  a  complex  plane  without  actually  seeking 
the  characteristic  polynomial  and  the  matrix  eigenvalues  themselves.  Also,  the  gen¬ 
eralized  matrix-sector  function  of  A  is  utilized  to  carry  out  the  block-diagonalization 
and  block-triangularization  of  a  system  matrix,  which  are  useful  in  developing  ap¬ 
plications  to  mathematicsJ  science  and  control- system  problems  [27,61]. 


4.1  Introduction 

The  matrix-sign  function  introduced  by  Robert  [17]  has  been  successfully  ap¬ 
plied  to  solve  systems  science  and  engineering  problems  [1,9,17,30],  [33j-[37]  such 
as  the  solutions  of  the  matrix  Lyapunov  and  Riccati  equations  and  tne  separa¬ 
tion  of  matrix  eigenvalues  relative  to  strips,  trapezoids,  and  circles  in  the  complex 
plane  without  actually  seeking  the  characteristic  polynomial  and  matrix  eigenvalues 
themselves.  The  important  features  of  the  use  of  the  matrix-sign  function  [9,17]  to 
systems  science  and  engineering  problems  are:  (a)  the  matrix-sign  functions  preserve 
the  eigenvectors  of  a  complex  matrix  which  may  have  a  real  or  complex  characteris¬ 
tic  polynomial;  (61  the  associated  matrix-sign  algorithms  converge  quickly  and  the 
convergence  speeds  are  independent  of  the  dimension  of  the  system. 

The  matrix-sign  function  of  A,  which  may  be  considered  as  a  matrix-2-sector 
function  of  A  and  can  be  expressed  as  Sign(i4)  =  A[‘V^]~^  where  is  the 
principal  square  root  of  a  complex  matrix  A^,  has  been  extened  to  the  matrix- 
sector  function  of  A  [26],  which  is  a  matrix-n-sector  function  of  A  and  can  be 
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expressed  by  Sectorn(-4)  =  i4[  ^ A^]~^  where  y/~^  is  the  principal  nth  root  of  yl". 
One  of  the  applications  of  the  principal  nth  root  of  a  matrix  is  in  the  derivation  of 
the  matrix-sector  algorithm  which  in  turn  has  many  applications  in  solving  control- 
system  problem.  The  matrix-sector  function  of  A  has  been  used  for  the  separation 
of  the  matrix  eigenvalues  relative  to  an  open  sector  of  a  complex  plane  and  for 
block-diagonalization  of  a  system  matrix. 

The  purposes  of  this  chapter  are  :  (a)  derive  fast  and  stable  algorithms  for 
computing  the  matrix-sector  function;  (b)  the  matrix-sector  function  of  A  is  gener¬ 
alized  to  the  matrix-sector  function  of  g(>l)  where  g(>l)  is  the  matrix  function  of 
a  conformal  mapping;  (c)  the  generalized  matrix-sector  function  of  A  is  applied  to 
the  system  matrix  A  for  the  separation  of  matrix  eigenvalues  relative  to  a  sector, 
a  circle,  and  a  sector  of  a  circle;  (d)  the  generalized  matrix-sector  function  of  A  is 
utilized  for  block-diagonalization  and  W'^ck-triangularization  of  the  system  matrix 
A. 


4.2  Definition  and  Properties  of  the  Matrix-sector  Function 

To  develop  fast  and  stable  algorithms  for  computing  the  matrix-sector  function, 
the  generalized  matrix-sector  function  and  their  associated  functions  with  applica¬ 
tions,  we  review  the  scalar-  and  matrix-sector  functions  in  the  following. 

The  scalar-n-sector  function  of  A  is  defined  as  follows. 

Definition  4.2.1  [26] 

Let  A  6  C"  be  expressed  by  A  =  |A|e-^®,  where  A  7^  0,  j  =  9  6  [0,  27r)  and 

9  ^  2Tr{q  +  5)/n  for  g  €  [0,  n  —  Ij.  Then,  the  scalar-n-sector  function  of  A,  defined 
as  Sectorn(A)  or  Sn(^)>  is 


Sector„(A)  =  Sn(A) 


±  for  q€  [0,  n  -  1],  (4.1) 


where  A  lies  inside  the  sector  in  C  bounded  by  the  sector  angles  27r(g  —  \)/n  and 
2Tr[q  -I-  |)/n,  and  is  the  principal  nth  root  of  A’'.  When  n=2,  the  scalar-sector 
function  of  A  becomes  the  sign  function  of  A  [9,17],  i.e., 

SectorzCA)  =  S2(A)  = 

=  X/\^  =  Sign(A)  for  q  E  [0,  1].  (4.2) 

□ 


The  matrix-sector  function  of  A  is  defined  as  in  the  following. 

Definition  4.2.2  [26,27] 

Let  .4  €  C^^'^,<t{A)  =  {Ai,t  =  1,2, •  •  • , m}, Aj  /  0  and  arg(Aj)  ^  2r{k 
-i-l/2)/n  for  fc  €  [0,n  —  1].  In  addition,  let  M  be  a  modal  matrix  of  A,  i.e., 
A  =  where  is  a  matrix  containing  Jordan  blocks  of  -4.  Then  the 

matrix-sector  function  of  4,  denoted  by  SectorTi(4)  or  Sn(.4),  is  defined  as 
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(4.3) 


Sectorn(X)  =  Sn(.4)  =  Sn(Ai)]Af  ‘i 

»=1 

where  SnC-^i)  is  the  scalar-n-sector  function  of  Aj.  □ 

The  matrix-sector  function  SnC-^)  defined  in  Definition  4.2.2  can  be  expressed 
as 

Sn{A)  =  ,  (4.4a) 

where  is  the  principal  nth  root  of  A”.  Also,  the  associated  matrix-sign  func¬ 
tion,  denoted  by  Sign(A)  [9,17],  becomes 

S2(A)  =  =  Sign(A).  (4.46) 

Moreover,  the  partitioned  matrix-sector  function  of  A  can  be  described  as  fol¬ 
lows. 

Definition  4.2.3  [26,27] 

Let  A  €  C""’^"*,a-(A)  =  {Aj,t  =  1,2, •••,Tn},A,-  0,  and  arg(Ai)  27r(p 

-l-l/2)n  for  p  €  [0, n  —  1].  Also,  let  M  be  a  mod^  matrix  of  A  .  Then,  the  gth 

matrix-n-sector  function  of  A,  denoted  by  Sn’^(A),  is  defined  by 

sS.''>(.4)  =  Jl/(0s!.'>(>i)lW‘.  (4.5) 

»=1 

where  the  gth  scalar-n-sector  function  of  Aj,  denoted  by  Sn’^(A^),  is 

{1,  when  2ir{q  —  l/2)/n  <  arg(Ai)  <  27r(g  -f  l/2)/n  for  g  G  [0,n  —  1] 
0,  otherwise. 

□ 

The  gth  matrix-n-sector  function  of  A  can  be  obtained  by  the  following  equa¬ 
tion, 

=  i  ElS"('‘)''''''’'"r’  forje  [O,-.-!].  (4.6) 


Separation  of  matrix  eigenvalues  is  one  of  the  applications  of  the  matrix-sector 
function  in  systems  theory.  For  example,  the  number  of  eigenvaJues  of  A  € 
which  lie  within  the  sector  angles  27r(g  —  l/2)/n  and  27r(g  -|-  l/2)/n,  where  g  >  0 

and  n  >  1,  is  trace(Sn’^(A)). 
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4.3  Fast  and  Stable  Algorithm  for  Computing  the  Matrix-sector  Func¬ 
tion 

One  of  the  applications  of  the  principal  nth  root  of  a  matrix  is  in  the  deriva¬ 
tion  of  the  matrix-sector  algorithm  which  in  turn  has  many  applications  in  solving 
control-  system  problems  [26,27].  The  fast  and  stable  matrix-sector  algorithm  corre¬ 
sponding  to  the  fast  and  stable  principal  nth  root  algorithm  in  (3.8)  can  be  obtained 
by  modifying  (3.8),  appropriately. 

The  direct  use  of  the  algorithm  in  (3.8)  to  compute  and  the  matrix- 

sector  function  in  (4.4a)  where  A  is  an  ill-conditioned  matrix  may  give  numerically 
unstable  results  because  it  involves  the  computation  of  A"  which  may  be  numerically 
unstable.  To  overcome  this  difficulty,  we  develop  a  fast  and  stable  algorithm  for 
computing  the  matrix-sector  function  in  the  following. 

Defining  Q{k)  =  AR~^{k)  and  G(A:)  =  A^R~^{k)  =  Q”(A:),  and  using  i2(0)  = 
Im  and  G(0)  =  A",  we  obtaun  the  simphed  matrix-sector  Jgorithm  from  the  algo¬ 
rithm  in  (3.8)  for  fc  =  0, 1, 2,  •  •  •  as  follows, 

P=1  <=i+l 

Yi^i{k)  =  Im  for  i  =  2,3,  •  •  •  ,r,  and  t  =  1,2,  •  •  •  ,71,  (4.7a) 

Q{k  +  1)  =  Q{k)Y2j{k)Y-J{k),  Q{0)  =  A, 


lim  Q{k)  =  Sn(A),  (4.76) 


where  n  denotes  the  index  of  the  nth  root  of  a  matrix  and  r  is  the  order  of  the 
desired  convergence  rate. 

Corollary  4.3.1 

The  algorithm  in  (4.7)  with  the  r(>  2)th-order  convergence  rate  is  numerically 
stable  in  the  sense  that  at  the  fcth  iteration  has  only  a  bounded  effect  on  succeeding 
iterates  if  no  new  errors  are  introduced  on  succeeding  iterates. 

Proof 

The  proof  of  Corollary  4.3.1  is  similar  to  that  in  Theorem  3.3.1.  ■ 

Some  explicit  forms  of  the  algorithm  in  (4.7)  are  listed  below. 

When  r  =  2,  (4.7)  becomes 


Q{k  +  1)  =  Q{k)  [2Im  +  {n-  2)g'*(fc)]  [/„,  +  (n  -  1)Q"(A:)' 


-1 


Q(0)  =  A, 


lim  Q{k)  =  Sn(A). 


(4.8) 


When  r  —  3,  (4.7)  becomes 
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Q{k 1)  =  Q{k)\3I„ 


n*  +  5n  —  12 


Q^{k)  + 


n*  —  5n  +  6 


<?’"(*)] X 


[/m  + 


+  3n  —  4 


Q^ik)  + 


n*  —  3n  +  2 


(?(0)  =  A,  lim  Q(k)  =  Sn(x4). 

*— »oo 


(4.9) 


Substituting  n  =  2,3  and  4  into  (4.8)  and  (4.9),  we  obtain  the  following  results. 
When  r  =  2  and  n  =  2,  we  have 

-I 


or 


Q(t  +  l)  =  Q{t)[2/„][/„  +  (?^(t)]  , 


i?(0)  =  A,  lim  C?(A:)  =  S2(>1), 

*— *oe 


<?(*!+ i)=j  (<?"■(*)+<?(*')]. 


(?(0)  =  A,  Bm  <?{i)  =  S,{.4). 
*—►00 


Note  that  Qnik)  =  Q"^(A:)  for  n  =  2  only. 
When  r  =  2  and  n  =  3,  we  have 


-1 


<?(*  +  1)  =  (?(B)[2/„  +  (?^(t)]  [/„  +  2(?’(t)] 


(?(0)  =  A,  lim  Qik)  =  S,(4). 
*—•00 


When  r  =  2  and  n  =  4,  we  have 

(?(*  +  !)  =  Q{k)  [21^  +  2(?-‘(fc)]  [/„,  +  3(?"(fc)]  , 


<?(0)  =  A,  lim  Qik)  =  S4{A). 

*—•00 


When  r  =  3  and  n  =  2,  we  have 

Q{k  +  1)  =  Q{k)[3Im  +  Q^ik)]  [im  +  3(?'(fc)]'\ 
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(4.10fe) 


(4.11) 


(4.12) 
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(4.13) 


(?(0)  =  A,  Urn  Q(k)  =  Sj(^). 

Jk— *oo 

When  r  =  3  and  n  =  3,  we  have 

<?(*  +  1)  =  <?(t)[3/„  +6<?’(i)]  [/„  +  +  <?•(*)]■', 

(?(0)  =  A,  lim  Q(k)  =  Sj(-4).  (4.14) 


When  r  =  3  and  n  =  4,  we  have 

Q{k  +  1)  =  g(A:)[3/„,  +  12Q\k)  +  Q®(fc)]  [l^  +  12Q"(ib)  +  3Q*(fc)]‘\ 

Q(0)  =  A,  lim  Qik)  =  S4(^)-  (4.15) 

fc— »oo 

Note  that  the  algorithm  in  (4.106)  is  the  commonly  used  matrix-sign  algorithm 
[1,9,17]  for  r  =  2.  Comparing  the  algorithm  in  (4.8)  with  that  in  [26,27]  for  deter¬ 
mining  the  matrix-sector  function,  it  can  be  noted  tnat  the  proposed  algorithms  do 
significantly  improve  the  computational  aspects  of  the  existing  algorithms. 

4.4  Definition,  Computational  Algorithms  and  Applications  of  the 
Generalized  Matrix-sector  Funcion 

In  this  section,  the  matrix-sector  function  has  been  generalized  to  the  matrix- 
section  function  of  ^(A)  where  g(A)  is  the  matrix  function  of  a  conformal  mapping, 
and  the  fast  and  stable  zdgorithms  for  computing  the  matrix-sector  function  are 
employed  for  finding  the  generalized  matrix-sector  function.  Also,  the  generalized 
matrix-sector  function  of  A  is  applied  to  the  system  matrix  A  for  the  separation  of 
matrix  eigenvalues  relative  to  a  sector,  circle,  and  a  sector  of  a  circle.  Furthermore, 
the  generalized  matrix-sector  function  of  A  is  utilized  for  block-diagonalization  and 
block-triangularization  of  the  system  matrix  A. 

The  generalized  scalar-n-sector  function  of  A  can  be  defined  below. 

D efinition  4. 4.1 

Let  the  function  of  a  conformal  mapping  be  A  t-4  g(A)  which  maps  simple  closed 
curves  L,  in  the  A-plane  onto  the  boundaries  of  the  n  minor  sectors  bounded  by 
sector  angles  27r(g  —  j)/ti  and  2‘ir{q  -I-  |)/n  for  g  6  [0,  n  —  1]  in  the  g(A)-plane. 
Thus,  the  whole  A-plane  is  separated  into  open  regions  C,  by  the  £,  such  that  the 
domains  Cq  for  g  €  [0,  n  —  1]  in  the  A-plane  will  be  mapped  into  the  domains  Dq 
bounded  by  the  sector  angles  27r(g  —  j)/n  and  27r(g  -I-  |)/n  for  g  €  [0,  n  —  1]  in 
the  g(A)-plane,  respectively.  Hence,  the  generalized  scalar-sector  function  of  A  with 
g(A)  ^  0  and  arg[g(A)]  ^  27r(g  -f-  \  )/n  for  g  €  [0,  n  —  1],  denoted  by  Sectorn(g(A)) 
or  Sn(g(A)),  is 

Sector„(g(A))  =  Sn(g(A)) 
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(4.16) 


where  A  lies  within  C,  and  g(A)  lies  within  Dg  bounded  by  the  sector  angles  2‘jr(q  — 
\)/n  and  27r(g  +  |)/n  for  g  €  [0,  n  —  l],  and  ^(g(A))’^  is  the  principal  nth  root  of 

(8(A)r.  □ 

When  n  =  2  and  g(A)  is  the  bilinear  transformation,  A  •— >  g(A)  =  (A  —  p) 
(A  +  p)“\  then  g(A)  maps  the  origin-centred  circle  of  radius  p  in  the  A-plane  onto 
the  imaginary  axis  of  the  g(A)-plane.  Also,  g(A)  maps  Cq,  the  exterior  of  the 
circle  in  the  A-plane,  into  the  open  right-half  g(A)-plane  Do,  which  is  the  0th  sector 
contzuning  the  set  of  sector  angles  {—ir/2,  7r/2).  Moreover,  g(A)  maps  Cj,  the 

interior  of  the  circle  in  the  A-plane,  into  the  open  left-half  gfA)-plane  D\,  which  is 
the  1th  sector  containing  the  set  of  sector  angles  (7r/2,  37r/2). 

The  extension  of  the  generalized  scalar-sector  function  of  A  6  C  to  the  gen¬ 
eralized  matrix-sector  function  of  A  6  and  its  associated  functions  with 

applications  can  be  stated  below. 

Theorem  4.4.1 

Let  the  matrix  function  of  a  conformal  mapping  be  A  *-+  g(A)  where  A  6 
^mxm,  ^  i,2,...,m},  g(Aj)  7^  0,  and  arg[g(Ai)]  7^  2ir(q  +  i)/n  for 

9  €  [0,  n  —  1].  Then,  the  generalized  matrix-n-sector  function  of  A,  denoted  by 
Sectorn(g(A))  or  Sn(g(A)),  is 

^Sn(g(Ai)) 

t=i 


Sector„(g(A))  =  Sn(g(^))  = 


=  gU)(V(i{^]-\  (4.17) 

where  the  matrix  M  is  the  modal  matrix  of  A,  and  S7i(g(A,))  is  the  generalized 
scalar-  sector  function  of  Aj.  Also,  ^(g(A))"  is  the  principsd  nth  root  of  (g(A))”, 
which  has  the  properties  that  (  ^(g(A))”)”  =  (gCA))”  and  each  eigenvalue  of 
is  the  principal  nth  root  of  each  (g(Ai))". 

The  associated  qth  generalized  matrix-sector  function  of  A  with  arg(A,)  0 

and  arg[g(A,)]  7^  2Tr{q  +  \)/n  for  9  €  [0,  n  -  1],  denoted  by  Sn  ^(g(A)),  is 

s!.''>(8(A))  i  J'/[©s!’>(g(A0)]A/-’ 

■t=l 


=  ^  ^[Sn(g(A))e->'"’'/"]'-'  €  for  9  e  [0,  n  -  1],  (4.18a) 

t=i 

where  the  gth  generalized  scalar-sector  function  of  Aj  is 
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s!'>(g(A 


.))={ 


1  when  A,-  €  C,  for  g  6  [0,  n  —  l] 


0  otherwise. 


The  complement  of  Sn’^(g(A4)),  denoted  by  is 

Sn’’(g(-4))  =  /„  -  S<,''>(g(/1))  €  C’”’"’"  for  ?  €  |0,  n  -  1], 
where  Im  designates  the  m  x  m  identity  matrix. 


(4.186) 


The  number  of  eigenvalues  of  A  lying  inside  the  domain  C,,  denoted  by  Nq,  is 


Nq  =  trace[si,''^(g(A4))]  for  g  G  [0,  n  -  1], 

and  the  .A-invariant  subspace  of  Sn^\g(j4)),  denoted  by  is 

5(9)  =  ind[S<’^(g(A))]  6  for  g  G  [0,  n  -  1], 


(4.19) 


(4.20) 


where  ind[-]  in  (4.20)  designates  the  collection  of  the  independent  (  abbreviation  as 
ind  )  column  vectors  of  the  matrix  [•].  The  matrices  5^’^  for  g  G  [0,  n— 1]  can  be  used 
to  construct  a  block-modal  matrix,  Ms,  for  carrying  out  the  block-diagonalization 
of  the  system  matrix  A,  i.e., 


where 


Mg^AMs  =  block  diagfAo, Ai,.. . , An-i]  € 

Ms  =  G 

Aq  =  (5<’>)+A(5<’))  G  , 

(5(9))+  =  [(5(9))*(S(9))]-1(5(9))*  £  ^ 


(4.2I0) 

(4.216) 


<'■(^4,)  C  Cq  for  g  G  [0,  n  —  1]. 


(4.21c) 


The  superscript  *  in  (4.21c)  designates  the  conjugate  transpose. 

The  other  A-invariant  subspace  of  Sn’^(g(A)),  defined  as  can  be  con¬ 

structed  as  the  collection  of  the  independent  row  vectors  of  Sn’^(g(A))  and  expressed 
as 

=  {ind[(s|’)(g(A)))^]}^  G  for  g  G  [0,  n  -  1].  (4.22) 

Hence,  the  associated  block-modal  matrix,  Mv,  can  be  constructed  and  used  for 
block-diagonalization  of  the  system  matrix  A, 


MvAMy^  =  block  diagfAo,  Aj, . . . ,  A„_i]  G  C’ 


(4.23a) 
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(4.235) 


and 


A,  =  (V<«>)^(r<’))+  e  , 


(V(7))+  =  (y(9))*[(v(9))(V^(9))*j-l  g  QmxN,  ^ 

<^{^q)  Q  Cq  for  9  €  [0,  n  —  1],  (4.23c) 


Also,  by  combining  the  A-invariant  subspaces  of  Sn’  (S(A))  in  (4.20)  and  (4.22),  a 
similarity  transformation  of  the  system  matrix  T  can  be  constructed  for  block- 
triangularization  of  the  system  matrix  A  so  that  each  submatrix  of  the  block- 
triangularized  system  matrix  contains  the  eigenvalues  lying  within  each  specified 
region  of  the  A-plane. 

The  similarity-transformation  matrix  and  its  inverse  are 


T  = 


(5(9))+ 

VM 


e  c 


mxm 


5^^)  =  ind[/„,  -  S|’^(?(A))]  €  (4.24a) 

and 

T-^  =  [S<’>,  (V^(’>)+].  (4.245) 


The  block-triangularized  system  matrix  becomes 


At  =  TAT-^  = 


■Ah  •  Arl 

.0  :  Al  . 


> 


(4,25) 


where  Ar  =  (S^^^)"*" A(5^’^),  <t(Ah)  Q  the  complement  of  C,,  Ai  =  (F^’))  A 
(F(9))+,  <t{Al)  C  C„  and  Arl  =  (5<’))+44(F(’))+. 


Proof 

When  g(A)  =  A,  the  various  results  in  this  theorem  have  been  proved  in 
[9,26,35,36,37].  The  corresponding  results  for  the  generalized  version  of  the  matrix- 
sector  functions  can  be  proven  in  a  similar  manner.  ■ 


Replacing  A  in  (4.7)  with  g(A),  we  can  obtain  the  fast  and  stable  algorithm 
for  computing  the  generalized  matrix-sector  function  Sn(8(A))  in  (4.17).  When 
arg[g(Ai)]  =  2ir{q  -f  §)/n  for  9  G  [0,  n  -  1],  the  matrix  g(A)  shall  be  rotated  by  a 

small  positive  real  angle  (A/3)  as  g(A)  =  g(Ae“^^'^),  so  that  the  algorithm  in  (4.7) 
can  still  be  applied  to  compute  Sn(8(A)). 
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Corollary  4.4.1 


Let  g(i4)  =  {A  -  pIm){A  +  pim)  where  A  €  det(i4  +  pim)  #  0,  and 

<r[A)  f)  C  =  <i>  and  C  is  a  circle  of  radius  p  with  center  at  the  origin  of  the  A-plane 
.  The  gth  generalized  matrix-sector  function  of  A  with  ti=2  and  g=0  becomes 

s^”’(g(4))  =  i|/„  +  S2(g(A))l  (4.26a) 

and  the  complement  of  S2'*^(8(.4)),  denoted  by  S2'*^(g(A)),  is 
S{°\S(A))  =  /„  -  Si"’(g(^))  =  S['\HA)) 

=  ^(/e.  -  Ss(g(X))l.  (4.266) 

The  number  of  eigenvalues  of  ^4  lying  in  the  exterior  of  the  circle  of  radius  p  is  No 

=  tracefS2°\8(44))],  and  that  in  the  interior  of  the  circle  is  Ni  =  trace{S2*^(8(A))] 
=  m  —  Nq. 

Proof 

The  bilinear  transform  (a  conformal  mapping),  g(A)  =  (A  —  p)(A  -|-  p)“’,  maps 
the  circle  of  radius  p  in  the  A-plane  onto  the  imaginary  axis  of  the  g(A)-plane  and 
the  interior  (exterior)  of  the  circle  into  the  open  left-half  (open  right-half)  g(A)- 
plane.  Hence,  Corollary  4.4.1  can  be  proved  using  Definitions  4.2.1  and  4.4,1  and 
Theorem  4.4.1.  ■ 

To  determine  the  number  of  matrix  eigenvalues  lying  inside  the  intersection  of 
two  specific  regions  (  Cq  and  Cj )  in  a  complex  plane  and  to  determine  the  associated 
A-invariant  subspace  of  the  intersection  region,  we  present  the  following  important 
result. 


Corollary  4.4.2 

Let  sLV^(6i(A))  and  SnV^(82(A))  be  two  associated  generalized  matrix-sector 
functions  of  A  which  can  be  expressed  as 


sLV^(8i(A))  = 

(4.27a) 

and 

sL’’^(82(A))  =  Milm,  eOmJJil-'  € 

(4.27fe) 

then,  we  have 

X  sLvHs2(a))  = 

(4.27c) 

where 

m* 

=  trace[S^,’‘Hgi(A))  x  sl,VH82( A))],  and  m,  =  m  -  m,. 

Let 

M,  ^  ind[S<„V’(gi(A))]  ,  ^/2  =  ind[sLV^(g2(>i))l  , 
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M^AMi  =  Ao  ,  and  AM2  =  Ai 


with  <r(Ao)  C  Co  and  cr{Ai)  C  Cj, 
also  let 


Then,  we  have 

M+AM^  =  A„  o-(A*)  C  Co  n  C,.  (4.27d) 


The  number  of  eigenvalues  lying  within  Co  H  Cj  is  m*. 

Proof 

Corollary  4.4.2  can  be  proved  by  using  the  fact  that  the  generalized  matrix- 
sector  function  of  A  preserves  the  eigenvectors  of  A.  ■ 

For  engineering  applications,  we  are  often  interested  in  selecting  a  sector  of  a 
circle  in  the  A-plane  due  to  the  consideration  of  damping  ratio,  damping  frequency 
and  decaying  rate,  etc.,  of  the  system.  The  separation  of  matrix  eigenvalues  relative 
to  a  sector  of  a  circle  can  be  stated  below. 

Corollary  4.4.3 

Let  A  €  and  <r(A)  =  {Af,  i  =  1,2, Also,  let  <r(A)  n  (£<,  f,+  i) 

=  where  ii  and  are  two  straight  lines  emanating  from  the  original  of  the 
A-pIane  at  angles  27r(g  —  j)/ti  and  27r(^  -f  |)/n  for,g  €  [0,  n  —  l].  Moreover,  let 

(t{A)  n  C  =  <i>,  where  C  is  a  circle  of  radius  p  centred  at  the  origin.  Then,  the 
generalized  matrix-sector  function  of  A  with  respect  to  this  sector  and  the  circle  of 

radius  p,  denoted  by  S^^’'^^(A),  is 

g(^,.^)(A)  =  sI^HA)  X  s5'^(g2(A)),  (4.28(1) 

where 

g2(A)  =  (A  -  p/m)(A -f-p/m)"’  for  9  €  [0,  n  -  1].  (4.286) 


The  complement  of  S^'^”^\a)  is  /m  -  The  number  of  matrix  eigenvalues 

lying  inside  the  closed  sector  is 


N,  =  trace[S^^”'’^(A)] 


(4.28c) 


Proof 

Corollary  4.4.3  can  be  proved  bv  using  Theorem  4.4.1,  Corollaries  4.4.1  and 
4.4.2.  ■ 
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4.5  Illustrative  Example 
Consider  a  system  matrix  A, 


■10.5  -il.5 
5.5  -  j8.0 
21.5  -f  ;3.0 
.12.5  -  >2.5 


-5.5 +  >11.5 
1.5 +>11.0 
-16.5  +>17.0 
-5.5 +  >11.5 


-3.5  -  >4.5 
-4.1  -  >0.8 
-4.5  -  >8.0 
-3.5  -  >4.5 


-20.5 +  >1.5  ■ 
-11.0  +  >11.5 
-34.0  ->10.5 
-22.5 +  >2.5  . 


(4.29) 


Find 

(a)  The  number  of  matrix  eigenvalues  lying  inside  the  sector  of  a  circle  with  a 
radius,  p  (  =  |  ’^det(A)|  for  m  =  4),  and  a  sector  angle,  6  (37r/4,  57r/4). 

(b)  The  block-triangularization  of  the  system  matrix  A  such  that  (r(Ai,)  lie  inside 
the  sector  of  a  circle  and  <r(Afi)  lie  outside  the  sector  of  a  circle. 

(c)  The  block-diagonalization  of  the  system  matrix  A  such  that  (r(Ao)  lie  inside 
the  sector  of  a  circle  and  o'{Ai)  lie  outside  the  sector  of  a  circle. 

Solution 

To  find  the  number  of  matrix  eigenv^dues  lying  within  the  closed  sector,  we  use 
Corollary  4.4.3.  The  geometric  mean  of  the  matrix  eigenvalues  is  ^  =  |  {/det(A)|  = 
3.2517.  Since  the  sector  angle  =  57r/4  —  37r/4  =  7r/2,  we  decompose  the  entire 
A-plane  into  n  {=:2irj(f>g=4)  sectors.  As  a  result,  the  number  of  the  sector,  q,  equals 
to  two.  Thus,  the  5{=2)th  matrix-sector  function  of  A  is 


s!'>(A)  =  s'"(A) 


■  1.0 +  >0.0 

0.0 

0.0  +  >0.5 

0.0 

-1.0 +  >0.5 

0.0 

.  0.0  +  >0.0 

0.0 

+  >0.0  0.0  +  >0.0 
->0.5  0.2 +  >0.1 
->2.5  1.0 +  >0.5 
+  >0.0  0.0 +  >0.0 


o.o  +  >o.o- 

0.5  ->  1.0 
2.5  +  >0.0 
1.0  +>0.0. 


To  use  (4.28),  we  compute  g2(>l)  and  S2^^(S2(A))  as 

82(44)  =  (A  —  pIm){A  +  pim)  ^ 


-7.1074  +>8.4017 
-0.0838  +>4.6235 
-9.3308  4- >4.4560 
-4.9359  +  >5.8681 


-1.7329  ->7.4696 
-4.6659  -  >0.6559 
3.8846  -  >8.1785 
-1.7329  ->7.4696 


3.3344  +  >0.8008 
1.6462  -  >1.2026 
1.8800  +  ;4.6235 
3.3344  +  >0.8008 


9.0711  -  J8.4017 
0.6559  -  >6.6297 
13.9150  -  >5.3180 
6.8996  -  >5.8681 


and 

si’’(g2(-4)) 
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■2.5-j0.5  -0.5  +  jl.5  -0.5-J0.5  -2.5  +  j0.5- 

0.5-jl,0  1.5-hjl.O  -0.5 +  >0.0  -1.0 +  >1.5 

2.5 +  >0,0  -1.5 +  >2.0  0.5- >1.0  -4.0  -  >0.5 
.1,5 ->0.5  -0.5 +  >1.5  -0.5 ->0.5  -1.5 +  >0.5. 


Thus,  the  desired  in  (4.28a)  becomes 


X  si%2(A)) 


■2.5 ->0.5  -0.5 +  >1.5  -0.5 ->0.5  -2.5 +  >0.5- 

0.5 ->0.5  0.5 +>0.5  -0.3 +  >0.1  -0.5  +  >0.5 

1.5 +  >0.5  -1.5 ->0.5  0.5 ->0.5  -1.5 ->0.5 

.1.5 ->0.5  -0,5 +  >1.5  -0.5 ->0.5  -1.5 +  >0.5. 


(4.30) 


The  number  of  eigenvzdues  lying  within  the  sector  of  a  circle  is 

N,  =  tracels'*”'’(.l))l  =  2. 


{4,31) 


It  is  interesting  to  note  that  o-{A)  =  -fAj  =  Aj  =  -2+jl,  A3  =  —10,  A  +  4  = 
—  1  +>2}  and  the  repeated  eigenvalues,  (Aj,  A2},  lie  in  the  desirable  sector.  Since 
the  characteristic  polynomial  of  /I  is  a  complex  polynomial,  the  test  procedures  due 
to  Gutman  and  Jury  [73]  and  Zeheb  and  Hertz  [74]  can  not  directly  be  applied  to 
determine  the  iV,  in  (4.31). 

To  find  the  block-triangularization  of  .4,  we  use  Theorem  4.4.1.  The  computed 
At  in  (4.25)  is 


At  =  TAT-^  = 


-16,25 ->3.75  13.75  ->8.75  :  -7.292  -  >1.875  8.375  ->5.542 
-0.75 ->7.25  5.25  +>5.75  :  -0.875  -  >3.458  5.458  +  >2.875 


0.00 +  >0.00  0.00  +  >0.00  :  -2.000  +  >1.000  0.000  +  >0.000 
0.00 +  >0.00  0.00 +  >0.00  :  0.000 +  >0.000  -2.000  +  >0.000. 


where 


(5(g))- 


(4.32) 


323 


• -0.25 -I- jO.25  -0.083  -  j0.250  -0.417  -  j0.417  -0.25  +  j0.25l 

-0.25  +  ;0.25  0.250  -  j0.083  -0.083  -  jO.583  -0.25  +  ^0.25 


2.50-J0.050  -0.500 +  J1.500  -0.500  -  j0.500  -2.50  +  j0.50 

.  0.50-J0.50  0.500 +  j0.500  -0.300  +  7O.IOO  -0.50 +  >0.50. 


r“^  =  [sS’^  I  (1/(9))+] 


-1.5 +  >0.5 

0.5  ->1.5 

0.250  +  >0.250 

0.250  -  >0.750 

-0.5 +  >0.5 

0.5  —  >0.5 

-0.417  +  >0.417 

2.083  ->0.417 

-1.5 ->0.5 

1.5  +>0.5 

0.250  +  >0.083 

-0.583  -  >0.750 

.-1.5 +  >0.5 

0.5  ->1.5 

-0,250  -  >0.250 

-0.250 +  >0.750. 

Note  that  =  {A3,  A4},  and  =  (Ai,  A2}. 

The  block- diagonalization  of  A  in  (4.21a)  is 

AMs  —  block  diag[/lo,Ai] 


-2.0 +  >1.0  0.0 +  >0.0  :  0.0 +  >0.0  0.0 +  >0.0 

0.0 +  >0.0  -2.0 +>1.0  :  0.0 +  >0.0  0.0  +  >0.0 

0.0 +  >0.0  0.0 +  >0.0  :  -16.25 ->3.75  13.75  ->8.75 

.0.0 +  >0.0  0.0 +  >0.0  :  -0.75 ->7.25  5.25  +  >5.75. 

where 

Ms  = 


(4.33) 


2.5  ->0.5 

—0.5  +>1.5 

-1.5  +  >0.5 

0.5  ->1.5 

0.5  ->0.5 

0.5  +  >0.5 

-0.5  +  >0.5 

0.5  -  >0.5 

1.5 +  >0.5 

-1.5  ->0.5 

-1.5  ->0.5 

1.5  +  >0.5 

.1.5  ->0.5 

-0.5  +  >1.5 

-1.5  +  >0.5 

0.5  ->1.5. 

Note  that  <r(Ao)  =  {Aj,  Aj}  and  <r(A])  =  {A3,  A4}. 
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4.6  Conclusion 

The  matrix-sector  function  of  A  has  been  generalized  to  the  matrix-sector  func¬ 
tion  of  g(>l).  Based  on  the  computationally  fast  and  numerically  stable  algorithms 
for  computing  the  principal  nth  root  of  a  matrix,  rapidly  and  stable  algorithms  for 
computing  the  matrix-sector  function  and  the  generalized  matrix-sector  function 
have  beed  developed.  The  generalized  matrix-sector  function  of  A  has  been  utilized 
to  carry  out  the  separation  of  matrix  eigenvalues  relative  to  a  sector,  circle  and  a 
sector  of  a  circle  in  the  A-plane.  Also,  the  generalized  matrix-sector  function  of  A 
has  been  employed  for  block-diagonalization  and  block-triangularization  of  the  sys¬ 
tem  matrix;  these  are  are  useful  in  developing  applications  to  mathematical  science 
[32]  and  control-system  problems  [31]. 


51 


825 


Chapter  5 

Determining  Continuous-time  State  Equations  from  Discrete-time 
State  Equations  Via  the  Principal  9th  Root  Method 


Fast  computational  methods  are  developed  for  finding  the  equivalent  continuous¬ 
time  state  equations  from  discrete-time  state  equations.  The  computational  meth¬ 
ods  utilize  the  direct-truncation  method,  the  matrix  continued-fraction  method, 
and  the  geometric-series  method  in  conjunction  with  the  principal  gth  root  of 
the  discrete-time  system  matrix  for  quick  determination  of  the  approximants  of 
a  matrix-  logarithm  function.  It  is  shown  that  the  use  of  the  principal  9th  root 
of  a  matrix  enables  us  to  enlarge  the  convergence  region  of  the  expansion  of  a 
matrix-logarithm  function  and  to  improve  the  accuracy  of  the  approximants  of  the 
matrix-logarithm  function  [28]. 


5.1  Introduction 

The  identification  [38]  of  a  continuous-time  system  using  the  sampled  input- 
output  data  of  the  system  often  results  in  an  equivalent  discrete-time  model.  Hence, 
the  conversion  of  the  obtained  discrete-time  model  to  the  original  continuous-time 
system  is  necessary.  Also,  a  given  discrete-time  system  is  often  transformed  into 
an  equivalent  continuous- time  model  so  that  the  well-developed  continuous-time 
approaches  such  as  the  frequency-domain  techniques  [39]  can  efficiently  be  applied 
to  the  transformed  model  for  analysis  and  design  of  sampled-data  control  systems 
|40]. 

Let  the  discrete-time  system  be 


x{kT  +  r)  =  Fx{kT)  4-  Gu{kT), 

y{kT)  =  Cx{kT),  (5.1) 


where  x  €  iZ”,  v.  €  y  €  R^,  the  constant  matrices  F,  G,  and  C  are  of 


appropriate  dimensions,  and  T  is  the  sampling  period.  The  equivalent  continuous¬ 
time  model  is  described  by 


i(t)  =  Ax{t)  -I-  Bu{t), 

yit)  =  Cx{t),  (5.2) 


where  ar(t)  ~  x{kT),  and  uft)  =  u(kT)  for  t  =  kT. 

The  relationships  [4l]-|44]  between  the  matrices  A  (B)  and  F  (G)  axe 

A  =  |ln(F)  (5.3a) 

and 

B  =  A[F-Ir,]-^G,  (5.36) 

where  /„  denotes  the  n  x  n  identity  matrix. 

The  problem  of  finding  the  matrix  .4  from  the  matrix  F  in  (5.3a)  has  been 
considered  by  several  authors  [43]-[46].  the  most  commonly  used  method  is  the 
direct-truncation  method.  That  is,  the  matrix-logarithm  function  in  ln(F)  with 
certain  convergence  conditions  is  expanded  into  a  certain  type  of  infinite  power 
series.  Then,  the  matrix  A  is  obtained  by  truncating  the  infinite  power  series. 
The  direct-truncation  method  is  a  simple  method;  however,  the  truncation  error 
depends  heavily  upon  the  type  of  power-series  expansion  used  and  the  number  of 
terms  taken.  Harris  [46]  has  proposed  a  method  which  converts  a  matrix-logarithm 
function  into  a  scalar-logarithm  function  via  a  modal-decomposition  technique.  The 
nonuniqueness  of  the  logarithm  of  complex  eigenvalues  and  the  requirements  of  the 
complicated  computations  of  eigenvectors  and  associated  and/or  repeated  eigenval¬ 
ues  with  unknown  multiplicity  limit  the  practical  use  of  Harris’  method.  It  seems 
that  other  methods  [43]-[45]  are  more  effective  and  straightforward  than  Harris’ 
f  method  [46]  when  the  matrix  of  interest  is  defective.  Recently,  Sinha  and  Lastman 

^  [44]  have  proposed  a  fixed-point  recursive  algorithm  for  computing  the  matrix  A 

from  the  matrix  F,  which  involves  the  approximation  of  the, Taylor  series  expan¬ 
sion  of  exp(>lT)  with  |<r(j4T)|  <  0.5,  where  (^{AT)  denotes  the  eigenspectrum  of 
the  matrix  AT.  Moreover,  Puthenpura  and  Sinha  [45]  have  proposed  the  matrix 
Chebyshev  method  for  the  approximation  of  the  shifted  matrix-logarithm  function 
ln(/n  -f  X)  with  0  <  (^IX)  <  1,  where  the  matrix  A*  =  F  —  In-  Furthermore, 
Shieh  et  ai.  [43]  have  proposed  a  direct-truncation  method,  a  matrix  continued- 
fraction  method  and  a  geometric-series  method  for  determining  the  matrix  A  from 
the  matrix  F.  The  above  three  methods  [43]  can  be  summarized  as  follows. 

1)  The  direct-truncation  method  is  as  follows: 


A  = 


iln(f) 
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2 

T 


R+\r^  +  + . 

3  5 


.  +  -i2”+  + 

n  n  +  2 


^R 

T 


2 

T 

2 

T 


1 


R-^-R^ 

3 


U  U 


(5.4a) 

(5.46) 

(5.4c) 

(5.4rf) 


where 


2)  The  matrix  continued-fraction  method  is  as  follows: 


(5.4/) 


^  =  fHF) 


=  ^R 
T 


3  5  7 


=  ^R[Ki  +  NIK,  +  N[K,  +  N[K,  +  JVl...)-‘]-‘|-’|-’]-> 
=  ^R[K,]-’  =  ^R 


^  fR{K,  +  NlK,]-'\-' I^R 


/„  - 


-1 


-  ~R[Ki  N[K2  +  JVf/Car’)"']"' 


2  „ 

4  ,1 

3  ,1 

~ 

/n  -  —R^ 

In  -  -R^ 

T 

15 

5 

-1 


~  -R[Ki  -h  N[K2  +  N[K3  -f 


=  ^R 

In  -  ^R^  +  —R"^ 

T 

L  21 

[  7  35 

-1 


(5.5a) 

(5.56) 

(5.5c) 

(5.5d) 


(5.5e) 


(5.5/) 
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where  N  =  R\  the  matrix  quotients  Ki  (=  kJr,)  is  the  ith  diagonal  matrix.  The 
ttn  scalar  fcj  can  be  determined  from  the  following  Routh  algorithm, 

“1,1  = 

a,,_,=0  for  j  = 

=  l/(2i  -  1)  for  j  =  1, 2, . . . , 

Oij  =  o<_2,j+i  -  ki^2ai-i,j+i  for  j  =  1, 2, . . .  and  t  =  3, 4, . . . , 

and 

*»  =  “.-.i/ai+i,!  for  t  =  l,2, - 


3)  The  geometric-series  method  is  as  follows: 


^  =  ^ln(F) 


pn+2i 


(5.6a) 


2 

T 


‘«n(l+l) 


(5.66) 


=  - 
r\  ^  3 


+  ...+  -R^ 
n 


In- 


for  |o-(i22)|  <  ^1  + 


1 

3 


1  -1 


In  ~  ri2' 


fo'-d 


for  n  =  1 


(5,6c) 

(5.6ii) 


In-~R} 

In  -  ~R^ 

15 

[  5 

-1 


for  n  =  3 


(5.6e) 


- 


/• _ —R^  ^  R* 

21^  -loi" 


-1 


for  n  =  5 


(5.6/) 
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The  condition  for  the  convergence  [47]  of  the  matrix  series  in  (5.4a),  (5.5a),  and 
(5.6a)  is  Re(<r(/’))  >  0.  Note  that  the  matrix  A  (~  (2/7’)iZ  =  (2/r)[F  —  /n][-^^  + 
/n]“^)  in  (5.46)  and  (5.5c)  can  be  obtained  by  using  the  bilinear-transform  method 
or  the  Tustin  methoa  [42]. 

From  (5.4c),  we  observe  that  if  Re(o-(F))  >  0,  or  all  eigenvalues  of  the  matrix 
F  lie  in  the  right-half  complex  plane,  then  |a(R)|  <  1  and  |o-(JV)|  =  |<t(R*)I  C  1. 
As  a  result,  the  first  few  terms  of  the  infinite  power  series  in  (5.4a),  (5.5a),  and 
(5.6a)  are  dominant  terms.  The  desirable  matrix  A  can  be  obtained  by  taking  the 
first  few  dominant  terms  of  the  infinite  power  series  in  (5.4a),  or  can  be  determined 
by  taking  the  first  few  dominant  matrix  quotients  (ivj)  oi  the  matrix  continued- 
fraction  expansion  in  (5.56).  Moreover,  the  desirable  matrix  A  can  be  obtained  by 
taking  the  first  few  dominant  terms  of  the  infinite  power  series  and  the  associated 
geometric-series  —  R^/(l  +  2/n)]~*/n  in  (5.6c).  However,  in  general,  the 

eigenvalues  of  the  matrix  F  are  not  available  and  eJl  eigenvjilues  of  the  matrix  F 
are  not  always  lying  in  the  right-half  complex  plane.  Therefore,  the  use  of  the 
above  three  methods  is  not  always  efficient.  The  purpose  of  this  note  is  to  develop 
a  computational  method,  which  uses  the  principaJ  9th  root  of  a  nonsingular  matrix 
F  (or  v/F  for  q  >  2)  [61]  together  with  the  methods  in  (5.4),  (5.5),  and  (5.6),  for 

placing  all  eigenvalues  of  the  matrix  VF  in  the  right  half  plane  and  for  quickly 
determining  the  matrix  A  from  the  matrix  F. 


6.2  Determining  Continuous-time  State  Equations  from  Discrete-time 
State  Equations  Via  the  Principal  gth  Root  Method 

The  property  of  the  matrix  \/F  for  g  >  2  can  be  utilized  to  derive  the  above 
three  approximation  methods  in  the  following. 


Rewriting  (5.3a)  gives 

A  =  iln(f ) 

=  iln(^)’  (5.Ta) 

=  iln{-yF)  (5.7») 

=  ^  + +  ,  (5.-C) 

where  the  matrix  is  t  aaC  principal  gth  root  of  the  matrix  F,  and  R  =  [v^F  — 
InW^F  Thus,  (5. 4), (5. 5),  and  (5.6)  can  be  rewritten  as 

A  =  ^  +  (5.8a) 

j  3  5 

-  ^l-R]  (5-86) 
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(5.8c) 

A  =  ^RIK,  +  +  filK,  +  JV(. . . 

(5.9a) 

(5.95) 

(5.9c) 

where  N  =  R^, 
and 


(5.10) 


The  condition  for  the  convergence  of  the  infinite  power  series  in  (5.7)  becomes 
arg(<T(F))  ^  TT.  The  eigenvalues  of  the  matrix  F  lying  on  the  negative  real  axis  in 
the  complex  plane  are  excluded  in  the  convergence  condition  due  to  nonuniqueness 
of  the  logarithm  of  negative  real  eigenvalues.  Note  that  the  convergence  region  of 
the  modified  infinite  power  series  in  (5,8),  (5.9),  and  (5.10)  has  been  greatly  enlarged 
from  the  original  Re(<r(F))  >  0  in  (5.4),  (5.5),  and  (5.6)  to  arg((r(F))  /  tt  in  (5.8), 

(5.9),  and  (5.10).  When  q  >  2,  all  cr(v^)  lie  inside  the  sector  angle  {—irlq,  +7r/g] 
of  the  complex  plane.  Therefore,  Re{a-{VF))  >  0,  |<7’(F)|  <  1,  |(t(F*)|  <C  1  and 

k(F^)|  <<  (1  +  2/n).  If  q  2,  then  |<t(.R)|  1.  Thus,  the  desired  matrix  A  can 

quickly  be  aetermined  by  taking  the  first  few  dominant  terms  of  the  righthand  side 
of  the  equations  in  (5. 8), (5. 9),  and  (5.10). 

When  the  eigenvalues  of  the  nonsingular  real  matrix  F,  which  may  contain 
negative  real  eigenvalues,  are  not  available,  we  can  employ  the  algorithm  in  (3.8) 

with  F  :=  F  =  Fe~'^^®  to  compute  VF.  If  VT'  is  a  complex  matrix,  then 

there  exist  negative  real  eigenvalues.  Thus,  the  desirable  real  matrix  A  cannot  be 

obtained  by  the  proposed  method.  On  the  other  hand,  if  is  a  real  matrix, 

then  arg((T(F))  ^  tt  and  the  methods  in  (5.8),  (5.9),  and  (5.10)  can  be  applied  to 
obtain  the  desirable  real  matrix  A. 


57 


831 


5.3  Illustrative  Example 

Let  an  unstable  discrete-time  system  matrix  F  be 


30  -100 
50  -70 


and  =  {— 20  i  y50}. 


(5.11) 


The  exactly  equivaJent  continuous- time  system  matrix  A  is 


5.9375  -3.9026 
1.9513  2.0349 


with  <r{A)  =  {3.9862  ±  >1.9513},  and  T  =  1.  (5.12) 

I 


Since  Re((T(F))  <  0,  the  desirable  matrix  A  obtained  from  (5.4),  (5.5),  and  (5.6) 
results  in  poor  approximations  of  1/T  ln(F).  However,  the  desirable  matrix  A  can 
be  obtained  from  (5.8),  (5.9),  and  (5.10)  as  follows.  The  computed  VF  with  q=4  is 


3.6627 

1.2697 


-2.5394 

1.1233 


with  <t(^)  =  {2.3930  ±>1.2697}. 


Note  that  arg(tr(\^))  £  (—w/4,  +ir/4),  \(r{\^)\  >  1,  and  Re[«r(v^)]  >  0.  The 
approximations  of  q/T  In(v^)  with  q=4  in  (5.8)  obtained  by  taking  the  first  N 
dominant  terms,  defined  as  .4^  ,  are 


5.4115 

1.5479 


-3.0958 
2.3157  J  ’ 


5.9466  -3.8945 
1.9472  2.0521 


and 


5.9376  -3.9029 

1.9514  2.0347 


The  associated  errors  ||j4  —  j4^^^||/jlj4j|  for  iV=l,  3,  6  are  1.4  x  10~*,  3.2  x  10  and 
5.9  X  10“®,  rspectively.  Also,  the  approximants  of  q/T\n{\fF)  with  g=4  in  (5.9) 
obtained  by  taking  the  first  N  dominant  quotients,  defined  as  a\!^\  are  Am^  =  .4^^^ 


5.9281 

1.9229 


-3.8459' 
2.0823  ’ 


5.9399 

1.9510 


-3.9021 

2.0378 
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and 


5.9378  -3.9029 
1.9514  2.0349 


The  associated  errors  ||.<4  —  j4ji?^^||/||A||  for  N  =  1,2, 3,4  are  1.4  x  10~^,  1.3  x  10 
4.4  X 10“^,  and  5.3  x  10~®,  respectively.  Moreover,  the  approximations  of  q/T\n{  \^) 

%f\ 

with  q=4  in  (5.10)  obtained  by  taking  the  first  N  dominant  terms,  defined  as  Ag  , 
are  A^g^  =  and 


5.9380  -3.9030 
1.9515  2.0350  ’ 

The  associated  errors  j|.4  —  .4j^^||/||.4j|  for  iV=l,  2,  3  are  1.3  x  10~^,  4.4  x  10“^,  and 
7.8  X  10~®,  respectively.  From  our  experience,  we  have  observed  that  the  direct- 
truncation  method  in  (5.8)  often  gives  satisfactory  approximations  when  g  is  a  large 
number  and  the  matrix  continued-fraction  method  in  (5.9)  converges  faster  than  the 
geometric-series  method  in  (5.10)  and  the  direct-truncation  method  in  (5.8)  when 
g  is  a  small  number. 

5.4  Conclusion 

New  computational  methods,  which  utilize  the  direct-truncation  method,  the 
matrix  continued-fraction  method,  and  the  geometrix-series  method  together  with 
the  principal  gth  root  of  a  discrete-time  system  matrix  have  been  presented  for  quick 
modeling  of  the  equivalent  continuous-time  state  equations  from  the  discrete-time 
state  equations.  The  proposed  method  is  useful  for  identifying  a  continuous-time 
system  based  on  the  observation  of  sampled  input-output  data  and  for  design  of 
sampled-data  control  systems. 
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Chapter  6 

Rectangular  and  Polar  Representations  of  a  Complex  Matrix 


This  chapter  presents  some  new  definitions  of  the  real  and  imaginary  parts  and 
the  associated  amplitude  and  phase  of  a  real  or  complex  matrix.  Computational 
methods,  which  utilize  the  properties  of  the  matrix-sign  function  and  the  principal 
nth  root  of  a  complex  matrix,  are  given  for  finding  these  quantities.  A  geometric- 
series  method  is  newly  developed  for  finding  the  approximation  of  the  matrix- valued 
function  of  tan“*  (A"),  which  is  the  principal  branch  of  the  arc  tangent  of  the  matrix 
X .  Sever illustrative  examples  are  presented  [75]. 


6.1  Introduction 

The  definitions  of  the  real  and  imaginary  parts  of  a  complex  number  in  rectan¬ 
gular  coordinates  and  the  associated  amplitude  and  phase  of  the  complex  number 
in  polar  coordinates  are  well  known,  and  these  have  been  commonly  used  in  math¬ 
ematical  science  and  control  system,  such  as  complex  variable  analysis  applied  to 
linear  control  system.  However,  extensions  of  these  definitions  for  a  complex  matrix 
(which  may  be  defective)  and  their  applications  have  not  been  generally  investigated 
by  researchers. 

For  simplicity  of  notation  through  out  this  chapter,  let  the  matrix  Re(A)  be  a 
real  matrix  which  contauns  the  real  part  of  each  element  of  the  matrix  A,  and  the 
matrix  Im(A)  be  a  read  matrix  which  consists  of  the  imaginary  part  of  each  element 

of  the  matrix  A.  Also,  let  the  matrix  denote  the  principal  nth  root  of  the 
matrix  A  and  the  matrix  tan“^(A)  be  the  principal  branch  of  the  arc  tangent  of  the 

matrix  A.  The  detailed  definitions  of  the  matrices  and  tan“^(A)  are  reviewed 
and  stated,  respectively,  as  follows. 

Definition  0.1.1  [20,21] 

Let  the  eigenspectrum  of  a  nonsingular  matrix  A  €  be  <t{A)  =  i  = 

1,2,...  ,m},  Xi  ^  0,  and  arg(Ai)  ^  tt. 

(1)  The  principed  nth  root  of  A  is  denoted  as  ^A  €  C’"*'"*,  where  n  is  a 
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positive  integer  and  is  such  that  (v(4)"  =  A,  and  for  every  Si  =  v^i  € 
t  =  1,2,  then  arg(5j)  G  (— ff/n,  rr/n),  where  is  the 

principal  nth  root  of  A,. 

(2)  The  matrix  tan~^(i4)  heis  the  property  that  <r{tan“*(A))  =  {tan“^(Im(A,)/ 
^e(^))}  =  {arg(A0  6  (-ir,ir),  i  =  l,2,...,m}. 

□ 

Computational  algorithms  [21,61]  are  available  for  finding  the  principal  nth 
roots  of  complex  matrices,  and  computational  methods  for  determining  the  matrix 
tan“^(A)  are  proposed  in  this  chapter.  Note  that  previous  algorithms  [4,18,49]  for 
finding  the  nth  root  of  a  matrix  may  not  result  in  the  principal  nth  root  of  the 
matrix. 

The  straightforward  extension  of  the  definitions  of  the  real  and  imaginary  parts 
and  the  amplititude  and  phase  from  a  scalar  to  a  matrix  can  be  described  as  follows. 

Let  A  €  be  a  nonsingular  matrix  with  <r{A)  =  {A^  =  Oj  +  j/3i  for  i  = 

1,2,  ...,m}  where  j  =  Then  the  rectangular  representation  of  a  complex 

matrix  A  would  be 


A  =  Re(A)  +  jlm(A),  (6.1a) 

and  the  polar  representation  of  A  would  be 

A  =  Dexp{j4>)  {6.1b) 

or 

A  =  exp{j(^)D.  (6.1c) 

where  the  matrix  D  =  [(Re(A))*  +  (Im(A))*]^^^,  and  the  matrix  <l>  is  either 
tan“*((Re(A))“’ (Im(.4)))  or  tan~*((Im(A)(Re(A))“^ ).  If  the  matrices  Re(A)  and 
Im(A)  in  (6.1)  do  not  contain  the  raod^  matrix  of  A,  <r(Re(A))  ^  Re(<r(A))  and 
<7(Im(A))  7^  Im(<7(A));  then,  in  general,  the  representation  in  (6.16)  or  (6.1c)  is 
not  the  polar  representation  of  A  because  \<t{D)\  /  |<r(A)|  and  cr{(t>)  ^  arg((7(A)). 
Another  important  consequence  would  be  Dcxp(j4>)  ^  txp{j4>)D.  In  other  words, 
the  commutative  property  of  matrix  multiplication  in  the  polar  representation  of 
a  matrix  in  (6.1)  is  not  preserved  because  the  matrices  Re(A),  Im(A),  D  and  <6 
do  not  contain  the  modal  matrix  of  A.  As  a  result,  it  is  difficult  to  generalize  a 
scalar-valued  function  to  a  matrix-valued  function,  and  to  develop  complex  variable 
approaches  to  the  analysis  of  linear  multivariable  control  systems. 

Another  popular  rectangular  representation  [52]  of  the  matrix  A  is 

A  =  (A  +  A-)/2  +  j[(A  -  A-)/2;],  (6.2) 


where  the  asterisk  superscript  (for  Hermitian)  designates  the  conjugate  transpose. 
If  the  matrix  A  is  not  a  normal  matrix  [48,51),  then  the  real  part,  (A  4-  A*)/2,  and 
the  imaginary  part,  (A  —  A*)/2j,  do  not  contain  the  modal  matrix  of  A,  and  they 
do  not  commute.  Therefore,  the  representation  in  (6.2)  is  not  suitable  to  be  used 
for  defining  the  amplitude  and  phase  of  A. 

.4  formal  polar  representation  [51]  of  a  nonsingular  matrix  .4  is 

A  =  HU,  (6.3) 
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where  the  matrix  ^  is  a  square  root  of  the  symmetric  matrix  {AA*),  and  the  matrix 
U  is  a.  unitary  matrix  having  U  =  H~^A  =  exp(j^)  where  ^  =  —jln{U).  If  the 
matrix  A  is  not  a  normal  matrix,  then  the  matrices  H  and  U  do  not  contain  the 
modal  matrix  of  A.  Also,  HU  ^  UH,  \<t{H)\  ^  \<^{A)\  and  <r(^)  ^  ajg((T(A)).  As 
a  result,  the  representation  in  (6.3)  is  not  suitable  to  be  used  for  defining  the  real 
and  imaginary  parts  of  A.  Hence  the  application  of  the  polar  representation  in  (6.3) 
to  complex  variable  analysis  and  computational  aspects  is  limited.  For  example,  if 
the  matrix  A  is  a  defective  matrix,  then  |«r(A)|  /  |<r(H^)|.  As  a  result,  the  matrix 
H  cannot  be  utilized  to  normalize  the  amplitude  of  tne  matrix  A  for  reducing  the 
computational  error  of  A*  where  fc  is  a  large  positive  integer.  The  need  for  the 
computation  of  A*  and  its  applications  can  be  found  in  [9,26,36]. 

This  chapter  presents  some  new  definitions  of  the  re^  and  imaginary  parts  and 
the  amplitude  and  phase  of  a  complex  matrix.  Procedures  are  given  for  comput¬ 
ing  these  matrices,  and  several  illustrative  examples  are  presented.  The  2ums  of 
this  chapter  are  primarily  to  develop  theoretical  tools  rather  than  highly  efficient 
computational  algorithms. 

This  chapter  is  organized  eis  follows:  In  Section  6.2,  we  define  two  different 
rertauguJar  and  polar  representations  of  a  matrix,  and  give  illustrative  examples. 
In  Section  6.3,  we  develop  computational  procedures  for  finding  the  projected  imag¬ 
inary  part  (A/),  the  projected  real  part  (Ar),  the  amplitude  (Ap)  and  the  phase 
{Ag)  of  the  matrix  A.  An  illustrative  example  is  shown  in  Section  6.4,  and  the 
results  are  summarized  in  Section  6.5. 

6.2  Rectangular  and  Polar  Representations  of  a  Matrix 

Let  us  first  define  the  rectangular  and  polar  representations  of  a  complex  ma¬ 
trix,  which  may  be  a  defective  matrix  [54],  in  the  following  way. 

Definition  6.2.1 

Consider  a  matrix  A  6  with  eigenspectrum  and  associated  modal  ma¬ 

trix, 

<t(A)  =  |a,  =  a,  for  i  =  l,2,...,fc 

k 

with  multiplicity  m^,  and  rrij  =  m,  Qj  ^  0 

;=i 


and  M  €  respectively.  Then  the  complex  matrix  A,  which  may  be  a  defec¬ 

tive  matrix  with  larg(<r(A))|  ^  tt/A  or  37r/4,  can  be  described  in  the  rectangular 
coordinates  as 

A  =  =  il/[Re(7)|il/-^  iil/[Im(7)]A/‘^  =  Ar  +  jA/,  (6.4a) 


where  the  matrix  J  is  of  Jordan  form,  the  matrices  Ar(  =  il/[Re(  J)|il/  ^ )  and  A/(  = 
7l/[Im(  J)]A/“M  are  the  real  and  imaginary  parts  of  the  matrix  A,  respectively. 
The  polar  representation  of  the  complex  matrix  A  is 

A  =  ApZAg  =  Apexp(jAg)  =  exp(jAg)Ap,  (6.46) 
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where  the  matrix, 


Ap  -  (A^  +  -4/)^^^, 

(6.4c) 

is  defined  as  the  amplitude  of  the  matrix  A,  and  the  matrix. 

Ag  =  tan"*(A/.4^’)  =  tan“^(A^^  A/), 

(6.4d) 

is  defined  as  the  phase  of  the  matrix  A. 

The  projected  real  and  imaginary  parts  of  the  matrix  A 
polar  coordinates  as 

can  be  computed  in 

=  ApCos(A^)  =  cos(Atf)Ap 

(6.4e) 

and 

A/  =  ApSm{Ag)  =  sin(As)Ap. 

(6.4/) 

□ 

Note  that  both  matrices  Ar  and  Af  contain  the  same  modal  matrix  of  A\ 
therefore,  the  matrices  Ar  and  Ai  commute,  and  the  matrices  Ap  and  exp  {jAg)  also 
commute.  When  the  matrix  A  is  a  defective  matrix  in  which  the  nontrivial  elements 
on  the  super-diagonal  line  of  the  Jordan  matrix  J  may  be  complex  numbers,  the 
matrices  Re(  J)  and  Im(7)  may  not  be  diagonal  matrices. 

When  the  eigenvalues,  associated  eigenvectors  and  the  structure  of  the  Jordan 
matrix  J  of  a  defective  matrix  A  in  (6.4)  are  known,  the  amplitude  matrix  Ap 
in  (6.4c)  can  be  determined  by  finding  the  principal  square  root  of  the  matrix, 
A\  +  Aj,  via  the  algorithms  developed  in  Chapter  3.  However,  the  determination 
of  the  principed  branch  of  tan“^(A^^.A/)  for  the  phase  matrix  Ag  in  (6.4d)  is  rather 
more  complicated  than  that  of  Ap.  An  illustrative  example  is  shown  as  follows. 

Example  6.2.1 

Consider  a  defective  complex  matrix  A, 

A  =  ^  ^  -a  with  a  ^  0. 

Following  Definition  6.2.1  with  A  =  J  and  M  =  I2  where  72  is  an  2  x  2  identity 
matrix,  we  obtain 

Ar  =  Re(A)  =  0  Q  ’ 


Ai  =  Im(A)  = 
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and 


where 


^  v/a’  + 

0 

Aq  =  tan“^(A^*A/)  =  tan“*(A’), 


^fo?TW 


A'  =  >l^M,= 


^  zl 

a  a^ 

0  ^ 

a 


Since  the  matrix  A'  has  Jordan  canonicaJ  form,  the  matrix  Ag  can  be  determined 
by  using  the  standard  formula  [51]  (Gantmacher  1959,  p,98)  as  follows. 


Ag  =  tan  ^(A”) 


for  !<r(A’)|  = 


<  1, 


where 


a 


\aj  \aj  j 


Note  that  the  evaluation  of  t&n~^(d/a)  depends  upon  the  signs  of  a  and  0  and  the 
determination  of  the  infinite  series  z  on  the  magnitude  o{\0la\.  For  exaunple,  when 
a  =  \0\  or  |arg(<7-(A))|  =  rr/4,  the  infinite  series  z  does  not  converge  and  becomes 
null  or  -0la^.  Hence,  the  matrix  Ag  is  not  the  desired  phase  matrix.  Thus,  we 
conclude  that  when  the  matrix  A  is  a  defective  matrix  with  any  |arg(<T(A))l  =  7r/4 
or  37r/4  (i.e.,  |<r(A')j  =  1),  and/or  |<r(A*)(  >  1,  the  direct  use  of  the  standard  formula 
[51]  (Gantmacher  1959,  p.  98)  for  determining  the  above  matrix-valued  function  of 
tan~^(A’’)  will  not  result  in  the  desired  phase  matrix  Ag,  A  computational  method 
will  be  developed  in  Section  6.3  to  overcome  the  above  difficulty  and  for  determining 
the  desired  Ag. 

Let  us  define  an  additional  notation,  which  will  be  used  throughout  this  chap¬ 
ter. 


Definition  6.2.2 

Let  the  matrix  J  €  be  the  Jordan  matrix  of  a  defective  matrix  A  € 

Cmxm^  and  let  the  diagonal  matrix  X  €  contain  only  the  diagonal  elements 

of  J ,  and  <r(A)  =  <t{J)  =  <t(A).  Then,  the  matrix  Ji  is  defined  as  J  -  A,  which  is  a 
matrix  containing  only  the  elements  on  the  super-diagonal  line  of  J.  The  nontrivial 
elements  in  7]  may  be  complex  numbers.  □ 

Definition  6.2.1  cannot  be  utilized  for  finding  the  rectangular  and  polar  rep¬ 
resentations  of  the  matrix  A  when  it  is  a  defective  matrix  with  |arg(A)|  =  7r/4  or 
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37r/4.  To  relax  the  constraint  in  Definition  6.2.1  and  to  develop  a  computational 
method  for  finding  the  rectangular  and  polar  representations  of  the  matrix  A  with¬ 
out  actuallly  knowing  the  eigenvalues,  eigenvectors  and  the  structure  of  the  Jordan 
form,  we  define  alternative  rectangular  and  polar  representations  of  the  matrix  as 
follows. 

Definition  6.2.3 

Let  A  €  be  a  matrix,  and  let  its  eigenspectrum  and  associated  modal 

matrix  be  <r(A)  =  {Aj  =  a,-  -I- for  i  =  1,  2,...,  k  with  multiplicity  TTi,,  and 

c*!  7^  0}j  and  M  6  respectively.  Then,  the  matrix  A  can  be 

represented  as 


A  =  MJM-*  =  A/[A  + 

=  {Ar<i  -I-  jAid)  +  Ai  =  ApdC^p{jA0ti)  +  Ai,  (6.5a) 


where  Ar^  =  A/[Re(A)]A/-^  =  MJiM-^Au  =  A/[Im(A)]J\/-^  =  (A^^  + 

and  Aed  =  tan“’(A^jA/d)  =  tan“*(A/ij4^i),  respectively.  The  polar  and 
rectangular  representations  of  the  matrix  A  can  be  defined  as  follows, 

A  =  [Apd  -K  Ai  exp(-; Afld)]  exp^A^) 


=  ApexpijAff)  =  exp{jAe)Ap  =  Ah  +;A/,  (6.56) 


where 

Ap  =  Apd  -I-  Ai  exp(-;A9)  =  (Ah  +  A/)^''^ 
Aq  —  A0d  =  tan”^(A/dA^^)  =  tan~^(A^jA/d) 
=  tan"^(A/A^*)  =  tan~^(A^^  A/), 

Ah  =  ApCos(A9)  =  cos(Aff)Ap, 

and 


A/  =  Apsin(.-le)  =  sin(Aff)Ap  . 


□ 

The  matrices  Ap,  A$,  Ah  and  A/  are  defined  as  the  amplitude,  phase,  real 
part  and  imaginary  parts  of  the  matrix  A,  respectively.  Note  that  in  general  these 
matrices  are  different  from  those  defined  earlier,  indicated  by  an  overbar. 

Note  also  that  any  additional  lower  subscript  d  of  a  matrix  shown  in  the  above 
definition  denotes  that  the  matrix  is  a  nondefective  matrix;  also  that  Ah  ^  ARd  +  Ai 
I  and  A I  ^  A/j.  A  simple  example  follows. 
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Example  6.2.2 

Consider  a  defective  complex  matrix  A  with  arg((r(i4))  =  7r/4, 


A  = 


1+il  1 
0  1+  jl 


Following  Definition  6.2.3  with  A  =  J,  M  =  /j,  A  =  diag(l  +  jfl,l  +  jl)  and 
Jl  =  J  —  A,  we  obtain 


’i  o' 

'1  o' 

'o  r 

And  = 

»  Aid  = 

II 

II 

[o  ij 

0  1 

0  0 

Apd  —  (-A/id  +  — 


V2  0 

0  V2 

The  desired  phase  matrix  Ag  and  amplitude  matrix  Ap  are 


and 


f  •’’’A 

.<4fl  =  tan  =  diagftan  *(l),tan“^(l)]  =  diag[7r/4,7r/4] 

Ap  =  Apd  -f  Ai  exp{-jAg)  =  Apd  +  Ai  diag 

V2  exp(^-;^) 

0  V2 

Also,  the  desired  real  matrix  .4/?  and  imaginary  matrix  A[  are 


=  ApCos{A0)  =  Apdiag 


cos 


©■•"©1 


and 


Ai  =  ApSm{A9)  =  Apdiz^ 


I  (l-;l)/2 
0  1 

1  (l-jl)/2 


Note  that  Definition  6.2.1  cannot  be  applied  directly  to  determine  the  phase  of 
A  in  Example  6.2.2  because  arg(<T(A))  =  tt/A.  Also,  note  that  the  matrices  Ap 
and  txp{jAg)  commute,  cr{Ap)  =  icr(A)|,  <r(.4^)  =  arg(<r(.4)),  ^(Ar)  =  Re((T(A)), 
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<t{Ai)  =  Im(<T(A)),  and  the  matrices  Ap,  Aq,  Ar  and  Aj  contain  the  modal  matrix 
of  A. 


0.3  Computational  Method  for  Determining  the  Amplitude  and  Phase 
of  a  Matrix 

The  determination  of  the  matrices  Ap  and  A$  can  be  accomplished  by  finding 
the  eigenvalues  and  eigenvectors  of  the  matrix  A  via  high  quality  algorithms  such  as 
the  QR  algorithm  [54]  and  LINPACK  [50],  etc.,  and  by  using  the  definitions  shown 
in  Section  6.2.  In  this  section,  a  computational  method  is  developed  for  finding  the 
matrices  Ap  and  Ae  without  directly  involving  the  eigenvalues  and  eigenvectors  of  A 
and  the  preknowledge  of  the  structure  of  the  Jordan  matrix  of  A.  The  matrix-sign 
function  [9,26]  of  A,  which  preserves  the  eigenvectors  of  a  complex  matrix  (which 
may  be  defective),  is  used  as  a  basis  for  the  development.  The  (scalar-)  sign  function 
of  a  complex  variable  A  with  Re(A)  ^  0  is  defined  by 

{-i-1  when  Re(A)  >  0 

,  (6.6) 

—  1  when  Re(A)  <  0 


where  v/V  is  the  principal  value  of  the  square  root  of  A^. 

Following  the  definition  in  (6.6),  the  matrix-sign  function  of  the  matrix  A  is 

defined  as  Sign(A)  =  A{V^)~^  with  Re(£r(A))  ^  0,  where  the  matrix  is 
the  principal  square  root  of  a  matrix  A^ .  The  computational  adgorithm  for  finding 
Sign(A)  can  be  found  in  Chapter  4. 

It  is  w’ell-known  that  the  imaginary  parts  of  the  eigenvalues  of  A  and  the 
associated  eigenvectors  of  A  are  invariant  under  the  horizontal  translation  of  A 
on  the  real  axis  with  a  real  value  7,  that  is  Im((r(A  -  7/m))>  and  A  -  flm  = 
i\/[(Re(J)  —  7/m)  4- /Im(/)]A/~'  where  /m  is  an  m  x  m  indentity  matrix.  When 
the  real  value  7  is  selected  so  that  Re(<r(A  —  7/m))  ==  0?  then  the  shifted  matrix 

A  —  Aj)  contains  only  imaginary  eigenvalues  =  l,2,...,m)  and  Aj . 

Hence,  the  desired  matrix  Ar^  in  (6.5)  becomes  A  —  A/.  The  matrix-sign  function 
of  A  is  utilized  to  determine  the  matrix  A/  and  the  desired  matrix  Aflj  in  (6.5)  as 
A  —  A/.  In  order  to  get  the  desired  matrix  Ajd  in  (6.5),  we  multiply  the  matrix  A 

by  j  and  repeat  the  above  procedure  to  compute  a  new  matrix  A/. 

Thus,  we  have 


Aid  =  A I  -  jA, 

and  the  desired  matrix  Aj  in  (6.5)  becomes 

Ai  =  A  —  (Aflrf  -I-  jAfd)- 

An  alternative  representation  of  Aid  is 

Aid  =  -JA/rf, 


(6.7a) 

(6.76) 

(6.7c) 


where  Aid  =  jAid- 
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Hence,  the  original  matrix  A  can  be  represented  ais 

A  =  {And Ai) jAii,  (6.7d) 

and  we  have  obtained  the  desired  matrices  Ajid,  Ai  and  Ajd  for  use  in  Definition 
6.2.3. 

The  computaional  algorithm  for  finding  Aj{=  A  —  7/m)  is  listed  as  follows. 
Algorithm  6.3.1 

Given  that  A  is  a  complex  matrix  of  dimension  m  x  m  with  eigenvalues  Aj  = 
Oi  + /3i,  i  =  1,2,  ...,m,  |Aj|  ^  0;  7  is  a  small  positive  value,  7  <  c  where  e  is  an 
acceptable  error  tolerance,  find  A/. 

Algorithm: 

Ao  =  A  -  {Re[trace(A)|/n}  •  /m. 


trace 

■/ 

m -1- Sign{A*  -  7/m)  , 

1  , 

. 

2 

Im  +  Sign(Afc  -  7/m)' 

VJ. 

2 

trace 

I 

m  -  Sign(Afc  +7-fm)  ,  1 

[  2 

4  A 

’  Im  -  Sign(Afc  +  7/m)' 

V4 

2 

Afcj-i  =  Afc  —  7' 


7m  +  Sign(Aifc  -  7/m)' 

7m  -  Sign(Afc  -1-  7/m) 

2 

1  ~  ^ 

2 

until 


trace 


Im  +  Sign(Afc  -  7/m) 


trace 


for  k  =  0, 1, 2,  •  •  •, 
/m  -  Sign(Afc  +  7/m)]  _  Q 


where  7"^,  7  and  7  are  scalars  chosen  so  that 

0  <  7  <  min{|  Re(A)j  j  A  €  (T(Ajfe),  Re(A)  O}. 

7“^  is  the  arithmetic  mean  of  {Re(A)  {  A  G  <t( A*.),  Re(A)  >  O}  and  7“  is  the  arith¬ 
metic  mean  of  {Re(A)  |  A  6  (r(Afc), Re(A)  <  O}. 

The  amplitude  of  .4  can  be  represented  in  terms  of  Ar^i  Ajd  and  Aj  as 

Ap  =  (Afl^  A]  exp(— jAff)  =  Apd  -f-  Ai  exp(-jA$).  (6.8) 
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Since  all  eigenvalues  of  Apd  are  positive  real  values,  we  can  compute  Apd  via  either 
the  algorithm  in  [61]  or  the  Newton-Raphson  algorithm  due  to  [54]. 

As  we  have  discussed  in  Example  6.2.1,  the  determination  of  the  phase  of  A 
with  and/or  without  prior  knowledge  of  the  eigenvalues  of  A  is  not  a  simple  matter. 
Based  on  the  property  of  the  principcil  nth  root  of  a  matrix  shown  in  Definition 
6.1.1,  we  propose  a  new  method  for  finding  the  approximation  of  A®  as  follows. 
Rewriting  the  matrix-valued  function  in  (6.5)  gives 

Aq  =  tan~^(A^jA/i) 


=  tan  =  tan  =  tan~*(-j.Y),  (6.9) 


where  X  =  Ap^^Ajd,  and  U'(A')  =  {A,  =  0  -f  _;(/3i/a,),  for  t  =  1,2,  ...,m}.  The 
matrix- valued  function  Ag  in  (6.8)  can  be  represented  by  an  infinte  series  as 


Ag  =  tan“^(-jA')  =  -j 


A'-h 


X®  X'' 


as  lu‘(X*)|  <  1,  and  Re(o'(A))  >  0.  (6.10a) 


Thus,  if  |(t(A')|  <€.  1  and  Re(cr(A))  >  0,  the  approximations  of  Ag  can  be  obtained 
by  taking  only  the  first  several  terms  as 


Ag 


(6.106) 


Since  not  all  i<T(A')|  are  less  than  or  equal  to  unity  and/or  Re(<T(A))  >  0,  it  is  difficult 
to  obtain  Ag  via  the  direct-truncation  method.  To  overcome  the  above  difficulty 
and  to  guarantee  the  convergence  of  the  infinite  series  in  (6.IO0),  we  determine  the 

principal  4th  root  of  A  (denoted  by  A^^^)  and  the  associated  matrices  A^j,  A^*J 

and  Aj^^  as  follows, 


(6.11a) 


where  the  matrices  A^^]  and  A^*J  can  be  obtained  by  Algorithm  6.3.1  having  the 
matrix  A  replaced  by  A^^^  Thus,  the  matrix  A^*J  becomes 


1/2 


(6.116) 


and  the  phase  of  A^‘‘\  denoted  by  Ap*\  can  be  fully  represented  by  an  infinite  series 
as 
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=  tan  ^  (-j  =  tan  ^(-jA")  = 


|<r(A')|  <  1,  and  Re((T(A^^^))  >  0, 

(6.12a) 


where  A'  =  From  Definition  6.1.1,  we  see  that  all  eigenvalues  of 

Ai*)  ,  or  ,  lie  inside  the  sector  in  the  A-plane  with  sector  angles  (»r/4,  — 7r/4); 


therefore,  the  convergence  conditions  in  (6.12a)  are  always  satisfied.  Hence,  the 
approximations  of  Ag  can  be  obtained  by  truncating  the  infinite  series  in  (6.12a)  as 


(6.126) 


The  desired  matrices  Ap  and  Ag  in  (6.5)  can  be  obtained  as 

exp(-;A«,) 

=  ]  +^iexp(-j4s)  (6.13a) 

and 

Ag  =  iA^g*\  (6.136) 


where  +  jA*d)  • 

It  is  well-known  that  the  Taylor  series  for  tan~^(®)  converges  too  slowly  to  be 
of  much  use  in  numerical  computation  when  the  argument  x  is  close  to  unity.  For 
example,  calculating  tan“^(0.9)  to  five  significant  digits  requires  the  first  29  terms 
of  the  Taylor  approximation.  A  more  sophisticated  approximation  of  Ag  can  be 
obtained  via  the  following  geometric-series  method. 

Rewriting  (6.12a)  yields 


=  -i 


A”  +  Ix^ 
3 


-I- 


=  -J 


A  +  Ix^  +  • .  •  +  -  A’'  A”-"^ 

3  n  n  +  2 


-"(-I) 


A 


n-!-2t 


(6.14a) 
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The  weighting  factor  of  the  term  in  the  infinite  series  in  (6.14a)  can  be 

approximated  by  the  following  approximation, 


■(■*!)'■(»  9 


(6.146) 


Thus,  we  have 


i'‘>  ^  A'  + 


+ 


+  ix- 

n 


A'  +  +  •  •  •  +  -  A" 

3  n 


v-2  -- 

(-9  (-9 


;X*  + 


for 


k(A'»)l<(:  +  |). 


(6.14c) 


Note  that  [/„,  —  A'^/(l  +  2/n)]  ^  is  a  geometric-series  and  it  converges  when 
k(A^)|  <  (1  +  2/n),  where  X  =  • 

Some  approximations  of  in  (6.14c)  for  n  =  1,3,5  are  listed  below, 


^  X{I^  -  iA^)-^  A(/„^  -  f,X^){Im  -  f  A^)-^ 

-  X{Tm  -  fiX^  -  TfeA^)(/n^  -  f  A^)-^ 


(6.14d) 

(6.14e) 


Smaller  values  of  k(A’’)|  result  in  better  approximations  of  A^g  \  Since  |<t(A)|  < 
1,  the  maximum  error  will  occur  when  |<r(A’)|  =  1.  Let  A  in  (6.14c)  be  unity,  then 
~  j3. 13333  (rad).  Note  that  the  exact  solution  of  A0  is  jir  —  j3. 14159  (rad). 
If  we  compute  the  principal  square  root  of  A^*^  and  use  (6.14e),  then  we  obtain 
=  j3. 14157  (rad)  which  is  close  to  the  exact  solution. 
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The  Taylor  series  of  cos^A^^^^,  and  exp^jA^^^^,  which  often  give 

good  approximations,  are  listed  as  follows, 


/jW") 

1  _  T 

j  L 

4 

[4‘>] 

(-4.  ) 

—  *m 

2!  4! 

6! 

+ 


sm 


^  . 

[4*’] 

3 

_  1  . 

[41 

5 

3! 

—  T 

5! 

7! 

and 


exp^A^/^)  =  cos(a^/^)  +  jsin^A^^^^). 


(6.15o) 

(6.156) 

(6.15c) 


To  determine  cos(Aa)  and  sin(Afl)  from  cos^A^"*^^  and  sin^A^^^^,  we  apply  the 
following  formulas. 


cos(n^)  =  ^|(2cos(^))''  -  y(2cos(<^))“  "  J  ,  )(2'os('^))"  ‘ 

-5('‘'^)(2cosWr-*  +  --}  (6.16a) 

and 

sin(n<^)  =  sin(<^)|(2 cos(<^))""*  —  ^  ^  ^^(2cos((^))"~® 

(6.166) 

where  4>  =  A^”^ 

If  the  first  four  dominant  terms  are  used  to  approximate  the  infinite  series  in 
(6.15a)  and  (6.156),  we  obtain 


4  2 

cos(Atf)  ~  8^cos^A^/^j  j  -  8^cos^A^/^j^  +/m,  (6.16c) 

sin(A«)  ~  4sin^A^/^^  cos^A^/^^  -  S^sin^A^/^^  j  cos^A^/^^.  (6.16d) 

Hence,  we  have 

exp(j A^)  =  cos(Aff)  +  j  sin(As).  (6.l6e) 

The  ma.ximum  error  will  occur  when  6  =  w.  In  this  case,  we  shall  obtain  the 
approximations  of  cos(^),  sin(^)  and  exp(jff)  by  the  above  procedure  as  cos(7r)  ~  1, 
sin(7r)  =  0  2r  4.6  X  10“®  and  exp(>7r)  =  -1  ~  -1  +y4.6  x  10”®.  The  approximations 
are  quite  satisfactory.  The  procedures  for  determining  the  matrices  Ap,  Ag,  Ar  and 
A/  are  summarized  in  the  following  algorithm. 
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Algorithm  6.3.2 


Step  1. 

Compute  the  principal  9th  (q  >  4)  root  of  A,  defined  as  via  the  algorithms 
in  (3.9). 

Step  2. 

Find  a!'j^  via  Algorithm  6.3.1  having  the  matrix  A  replaced  by  to  obtain 
and  A^j^  via  procedures  derived  in  Section  6.3. 

Step  3. 

Determine  Aj,  Ap  and  A®  as  follows, 

Ap  =  Apd  +  Ai  exp(-jA<,)  =  «p(-;As), 


where 


Aff  =  qA^g\ 


j(9)  _  _ 
^9  — 


A'^  A'5  A'^ 

^  ^'  +  T  +  T-"T  +  - 


for  lo’(A’’)|  <  1,  and  Re(<r(A^’^))  >  0 


3  v2  S-l 


~  -  fA'^)-^  -  -JA'(/n^  -  f^X^){Im  -  f A'^) 


~  -jX^lqr.  -  iA'^  -  Tf5A-^)(/,n  - 


5  v2\-l 


where 


A-  = 

exp(-jA9)  =  cos(A0)  -  j  sin(As). 


The  cos(A9)  and  sinfA^)  can  be  obtained  by  using  the  approximations  of  the  infinite 
series  in  (6.15)  and  (6.16). 

Step  4. 
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Determine  Ar  and  Aj  as 


Ar  =  Apcos{Ag), 
Aj  =  i4psin{>i^). 


6.4  Illustrative  Example 

Consider  a  defective  real  matrix  A  as 

r  0  1  0  0- 


0  0  10 

A=  =MJM-\ 

0  0  0  1 

.-4  8  -8  4. 


where 


Ai  1  0  0 
0  Ai  0  0 


0  0  A,  1 


0  0  0  A, 


M  = 


1  0  1  0  ■ 

Aj  1  A2  1 

Af  2Ai  A|  2A2 

.Xl  ZX]  A3  ZXl. 

Aj  =  1  +  jl  and  A2  =  1  —  jl. 


Find 


Ap  the  amplitude  of  ^4, 

Ae  the  phase  of  A, 

Ar  the  projected  resJ  part  of  A,  and 
Aj  the  projected  imaginary  part  of  A. 

Solution 

From  Step  1  in  Algorithm  6.3.2,  we  use  the  algorithm  in  (3.9)  to  obtain 
r  0.697246  0.424768  -0.132240  0.0262301 


A^*^  = 


-0.104920  0.907085  0.214929  -0.027320 


0.109282  -0.323484  1.125649  0.105646 


-0.422587  0.954456  -1.168657  1.548236. 


It  might  be  interesting  to  note  that  =  { v^i ,  v^i ,  = 
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{1.069554  + j0.2127475,  1.069554  + j0.2127475,  1.069554 -j0.2127475,  1.069554- 
jO. 2127475},  Re((T(i4))  >  0,  and  |  Im(<r(.4))/ Re(<T(>l))|  <  1.  From  Step  2  in  Algo¬ 
rithm  6.3.2,  we  use  Algorithm  6.3.1  to  obteun 


aJJJ  =  diag[l. 069554,  1.069554,  1.069554,  1.069554] 

and 


■jO.425495 

-iO.638243 

j0.319121 

-j0.106374- 

jO.425495 

-jO.425495 

jO.212748 

-jO.106374 

jO.425495 

-jO.425495 

jO.425495 

-jO.212748 

.j0.850990 

-j  1.276485 

;  1.276485 

-jO.425495. 

Note  that 

=  {0.2127475,  -0.2127475,  0.2127475,  -0.2127475}  C  R. 

Thus,  we  obtain 

Apd  =  =  diag[l.414214,  1.414214,  1.414214,  1.414214). 


From  Step  3,  we  can  compute  Ag  with  ^  =  4  and  X  = 


Ag  =  4A^^^  = 


■j 1.570796 

->2.356194 

>1.178097 

->0.392699- 

j  1.570796 

-j  1.570796 

>0.785398 

->0.392699 

j 1.570796 

->1.570796 

>1.570796 

->0.785398 

.j3.141593 

->4.712389 

>4.712389 

->1.570796. 

where 

(T{Ag)  =  {0.7853981,  -0.7853981,  0.7853981,  -0.7853981}  C  R. 
Also,  we  get 


rl  -2  1.5  -0.51 


Ai  =  A- 


+  J 


2-3  2 

2  -2  1 
0  2-2 


-0.5 

0 

1 
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and 


A.p  —  Apd  +  .4i  exp(-jA«)  = 


■1.414214 

-0.707107 

0.707109 

-0.353553- 

1.414214 

-1.414214 

2.121320 

-0.707109 

2.828427 

-4.242641 

4.242641 

-0.707109 

.2.828427 

-2.828427 

1.414213 

1.414214. 

where 

(T{Ap)  =  {1.414214,  1.414214,  1.414214,  1.414214}  C  R. 
From  Step  4,  we  obtain 


Ar  =  ApCos{Ae)  = 


with  (t{hr)  =  {1,1,1,!}  C  Ri  and 


Aj  =  Apsin(Afl)  = 


■1 

-0.5 

0.5  - 

■0.25- 

1 

-1 

1.5  - 

■0.5 

2 

-3 

3 

■0.5 

.2 

-2 

1 

1  . 

-;T.5 

;0.5 

-;0.25- 

JT 

;0.5 

-jO.5 

72 

-J3 

J3 

-jT.5 

-;6 

-jTO 

;9 

-;3 

J 

with  cr{Ai]  =  {1,  -1, 1,  —1}  C  R- 

It  can  be  shown  that  v4  =  Ar+JAj  =  Apexp{j A0),  Ap  =  and  Ag  — 

tan~^(/l^^  ^/)  =  tan~^(^/.4^* ). 

Note  that  Ar  ^  Re(.4)  =  A,  Aj  ^  Im(.4)  =  O4,  Ap  #  [(Re(.4))2+(Im(A))2]^/2  = 
-f-  04]^/^  =  A,  and  .4^  /  tan~^  [(Re(.4))~’  (Im(.4))]  =  tan~^  (i4~’04)  =  O4. 

6.5  Conclusion 

The  amplitude  and  phase  of  a  complex  matrix  and  the  projected  real  and  imag¬ 
inary  parts  of  the  complex  matrix  have  been  defined  and  computational  methods 
for  finding  the  above  matrices  have  been  proposed  in  this  chapter.  By  utilizing 
the  important  property  of  the  matrix-sign  function  that  the  associated  matrix-sign 
functions  of  a  shifted  complex  matrix  preserve  the  eigenvectors  of  the  original  ma¬ 
trix,  the  algorithm  for  finding  the  principal  nth  root  of  a  complex  matrix  has  been 
employed  for  computing  the  amplitude  and  phase  of  the  original  complex  matrix. 
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The  newly  developed  geometric-series  method  can  be  utilized  for  finding  the  approx¬ 
imation  of  the  matrix- valued  function,  tan-*(A'),  where  .Y  is  a  matrix.  Questions 
of  cornputational  cost  have  not,  however,  been  considered  in  any  detail.  The  ap¬ 
plications  of  the  developed  amplitude  and  phase  of  a  complex  matrix  to  systems 
theory  [32]  are  being  investigated. 
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Chapter  7 

Application  of  the  Principal  nth  Root  Method  to 
Large-scale  Discrete  Systems  Design 


A  multi-stage  pseudo-continuous-time  state-space  method  is  developed  for  de¬ 
signing  large-scale  discrete  systems,  which  do  not  exhibit  a  two-  or  multi-time  scale 
structure  explicity.  The  designed  paeudo-continuous-time  regulator  places  the  eigen¬ 
values  of  the  closed-loop  discrete  system  within  the  common  region  of  a  circle  (con¬ 
centric  within  the  unit  circle)  and  a  logarithmic  spiral  in  the  complex  z-plane,  with¬ 
out  explicity  utilizing  the  open-loop  eigenvalues  of  the  given  system.  The  proposed 
method  requires  the  solution  of  small  order  Riccati  equations  only  at  each  stage  of 
the  design.  An  illustrative  example  is  presented  to  demonstrate  the  effectiveness  of 
the  proposed  procedures  [78]. 


7.1  Introduction 

Physical  realizations  of  engineering  systems  result,  in  general,  in  large-scale 
models.  In  most  cases,  it  is  quite  impractical  to  consider  the  analysis  and  design 
of  the  large-scale  system  model  itself.  Therefore,  a  necessity  arises  for  decomposing 
the  original  system  into  decoupled  subsystems,  each  with  their  own  distinct  char¬ 
acteristics,  so  that  the  resulting  model  has  a  completely  decoupled  multi-time  scale 
structure.  Some  of  the  existing  approaches  for  decomposition  of  large-scale  systems 
are  aggregation  [55],  multi-time  scales  [56]  and  model  analysis  [57].  However,  most 
of  these  appear  to  be  restricted  to  the  continuous-time  systems.  The  corresponding 
problem  for  large-scale  discrete-time  systems  has  received  very  little  attention  [58- 
60].  Mahmoud  et  aJ.  [58]  derived  a  matrix-norm  condition  for  separating  large-scale 
discrete-time  systems  into  two-time  scales  without  originally  assuming  the  availabil¬ 
ity  of  such  a  structure.  However,  computationally,  it  might  not  be  always  be  feasible 
to  satisfy  this  condition.  Shieh  et  al.  [53]  have  developed  an  algebraic  method  based 
on  the  matrix-sign  function  [9]  for  separating  the  slow  (dominant)  modes  from  the 
fast  (nondominant)  modes  (two-time  scale  structure)  of  a  large-scale  multivariable 
system  (continuous  and  discrete).  The  matrix-sign  function  algorithm  has  been 
used  for  the  following;  block- diagonalization  and  block-triangularization  [37]  of  a 
large-scale  system,  i.e.,  decomposing  the  system  into  parallel  and  cascaded  struc- 
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tures;  solving  non-linear  Riccati  equations,  which  often  appear  in  feedback  design  of 
systems  based  on  linear  quadratic  theory;  and  model  conversions  of  systems  via  the 
computation  of  the  principal  gth  root  of  the  system  matrix  ^29,37].  Recently,  fast 
and  stable  algorithms  have  been  developed  for  the  computation  of  the  matrix-sign 
function  [29]  and  for  the  computation  of  the  princip^d  9th  root  of  a  complex  matrix 
[37]  which  in  turn  can  be  used  for  discrete-to-continuous  model  conversion.  These 
algorithms  will  be  utilized  in  the  development  of  our  multi-stage  design  procedure 
for  designing  discrete  controllers  with  pole- assignment  in  a  specified  region  of  the 
complex  z-pTane. 

The  optimal  linear  quadratic  (LQ)  design  method  has  several  good  properties. 
For  instance,  the  closed-loop  system  is  stable  and  has  good  robustness  properties 
provided  the  weighting  matrices  satisfy  certain  positivity  conditions  [62].  The  tran¬ 
sient  behavior  of  the  closed-loop  system  is,  however,  difficult  to  determine  since 
there  is  a  complex  relation  between  the  weighting  matrices  and  the  closed-loop 
poles.  This  implies  that  the  weighting  matrices  have  to  be  determined  through  tri^ 
and  error.  Pole-placement  methods  have  the  advantage  that  the  closed-loop  poles 
can  be  specified.  The  drawback  is  the  nonuniqueness  of  choice  of  feedback  for  mul¬ 
tivariable  systems.  Further,  it  is  too  restrictive  to  place  the  poles  in  pre-determined 
locations  [63],  since  for  nonlinear  systems  the  exact  location  of  the  closed-loop 
poles  might  be  difficult  to  attain  for  each  operationed  condition.  Hence,  in  general, 
it  would  suffice  to  have  the  poles  placed  within  a  specified  region.  Also,  the  re¬ 
gional  pole-assignment  method  is  suited  for  tradeoffs  between  eigenvalue  locations, 
actuator-signal  magnitudes  and  requirements  of  robustness  against  large  parameter 
variations,  sensor  failures,  implementation  accuracies,  gain  reduction,  etc.  [13].  In 
this  chapter,  we  consider  the  common  region  of  a  circle  and  a  logarithmic  spiral  in 
the  2-plane  (Fig.  7.2)  for  pole-assignment.  This  is  equivaJent  to  the  sector  region 
(hatched)  in  Fig.  7.1  in  the  s-plane.  It  is  well-known  that  if  the  poles  of  a  system 
lie  within  the  above  mentioned  region(s),  then  the  system  responses  converge  at 
appropriate  speed  and  any  existing  vibrating  modes  are  well-damped. 

The  problem  of  designing  feedback  gains  to  optimally  place  all  the  poles  of 
a  closed-loop  system  within  a  specified  region  was  first  studied  by  Anderson  and 
Moore  [62],  who  used  a  shifted  system  matrix  to  obtain  an  optimal  closed-loop 
system  with  its  eigenvalues  lying  in  the  open  left-hand  side  of  a  vertical  line  on 
the  negative  real  axis.  Shieh  et  ad.  [64,65]  extended  this  idea  to  optimally  place 
the  poles  within  a  vertical  strip  as  well  as  a  horizontal  strip  in  the  left-half  plane. 
Kawasaki  and  Shimemura  [66]  propsed  an  iterative  procedure  to  place  the  poles 
inside  a  hyperbola  in  the  left-half  plane,  which  is  actually  an  approximation  of  the 
sector  region  shown  in  Fig.  7.1.  In  [67],  a  pseudo-continuous-time  method  has 
been  developed  to  place  the  eigenvalues  of  a  discrete  system  within  the  hatched 
region  of  Fig.  7.2.  However,  it  involves  the  solution  of  full  order  Riccati  equations, 
which  could  be  computationally  difficult  for  large-scale  systems.  The  Luenbeger 
transformation,  sometimes  numerically  unstable,  is  utilized  to  transform  the  full 
order  discrete-time  system  to  its  equivalent  canonical  form  so  as  to  determine  the 
pole-placement  discrete-feedback  gain.  In  this  chapter,  at  each  stage  of  the  design, 
only  reduced  order  Riccati  equations  need  to  be  solved  and  also,  in  most  cases,  the 
transformation  to  the  general  canonical  forms  is  avoided. 

The  material  in  this  chapter  is  organized  as  follows:  Section  7.2  contains  a 
review  of  the  results  associated  with  the  design  of  a  linear  quadratic  regulator  which 
would  optimally  place  the  closed-loop  eigenvalues  of  a  continuous-time  system  on  or 
within  the  hatched  region  of  Fig.  7.1.  In  Section  7.3,  the  method,  using  the  matrix- 
sign  function,  for  block-decomposing  a  large-scale  discrete-time  system  into  a  multi¬ 
time  scale  structure  is  introduced.  Then,  a  brief  review  of  the  model-conversion 
technqiues  is  given,  following  which  a  pseudo-continuous-time  multi-stage  design 
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procedure  is  presented  for  designing  large-scale  discrete  systems  decomposed  in  a 
multi-time  scale  structure,  with  pole- placement  on  or  within  the  hatched  region 
of  Fig.  7.2.  An  illustrative  example  is  given  in  Section  7.4  to  demonstrate  the 
effectiveness  of  the  proposed  design  procedure  and  the  conclusions  are  summarized 
in  Section  7.5.  Some  computational  2dgorithms  are  given  in  Appendix  A. 

7.2  Continuous-time  Optimal  Quadratic  Regulators  with  Pole-place¬ 
ment 

Consider  the  linear  controllable  continuous-time  system  described  by 

x{t)  =  Ax(t)  -I-  Bu{t),  x(0),  (7.1) 


where  x(t)  and  u(t)  are  the  n  x  1  state  vector  and  the  m  x  1  input  vector,  respectively, 
and  A  and  B  are  constant  matrices  of  appropriate  dimensions.  Let  the  quadratic 
cost  function  for  the  system  in  (7.1)  be 


J  = 


{x^ {t)Qx{t)  -f  u^{t)Ru{t))di, 


(7.2) 


where  the  weighting  matrices  Q  and  R  are  n  x  n  nonnegative-definite  and  m  x  m 
positive-definite  symmetric  matrices,  respectively.  The  feedback-control  law  that 
minimizes  the  performance  index  in  (7.2)  is  given  by  [62], 

u{t)  =  ~Kx{t)  +  f{t)  =  -R-^B^Px{t)  +  r{t),  (7.3) 

where  K  is  the  feedback  gain,  f(<)  is  a  reference  input  and  F,  a  n  x  n  nonnegative- 
definite  symmetric  matrix,  is  the  solution  of  the  Riccati  equation, 

PBR-^B'^P  -  PA-  A'^P  -Q  =  0n  (7.4) 

with  (Q,.4)  detectable.  The  superscript  T  and  the  matrix  0„  denote  the  transpose 
and  the  n  x  n  null  matrix,  respectively.  Thus,  the  resulting  closed-loop  system 
becomes 

x{t)  =  {A-  BK)x{t)  +  Br{t).  (7.5) 

The  eigenvalues  of  A  —  BK ,  denoted  by  cr(.4  —  BK),  lie  in  the  open  left-half  plane 
of  the  complex  s-plane.  Our  objective  is  to  determine  Q,  R  and  K  so  that  the 
closed-loop  system  in  (7.5)  has  its  eigenvalues  on  or  within  the  hatched  region  of 
Fig.  7.1.  The  important  results  along  with  the  design  procedure  to  achieve  the 
desired  design  are  presented  in  the  following. 

Lemma  7,2.1  [62,67] 

Let  [A.B]  be  the  pair  of  the  given  open-loop  system  in  (7.1).  Also,  let  h  >  0 
represent  the  prescribed  degree  of  relative  stability.  Then,  the  eigenvalues  of  the 
closed-loop  system  A  —  BR~^  B^ P  lie  to  the  left  of  the  —h  vertical  line  with  the 
matrix  P  being  the  solution  of  the  Riccati  equation, 

FFR-'B^F  -  F(.4  -  hin)  -  {A  ^  P  =  u„,  (7.6) 


where  the  matrix  /„  is  an  n  x  n  identity  matrix. 
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Theorem  7.2.1  [67] 

Let  the  given  stable  system  matrix  A  €  72.”’*”  have  eigenvalues  A“  (t  = 
1, . . .  ,n“)  lying  in  the  open  sector  of  Fig.  7.1  and  the  eigenvalues  Aj*"  (t  =  1, . . .  jn"*") 
lying  outside  that  sector,  with  n  =  n~  +  n'*’.  Now,  consider  the  two  Riccati  equa¬ 
tions, 

QBR-^B^Q  -  Q(~A^)  -  (-A^fQ  =  0^  (7.7a) 

and 

PBR-^B^P-PA-A^P-Q  =  0n.  {7.7b) 

Then,  the  closed-loop  system, 

Ac=-A-  rBK  =  A-  tBR-^B'^P,  (7.8) 

will  enclose  the  invariant  eigenv^ilues  A~  {i  =  1, . . . ,  n”),  and  at  least  one  additional 
pair  of  complex  conjugate  eigenvalues  lying  in  the  open  sector  of  Fig.  7.1,  for  the 
constant  gain  r  in  (7.8)  satisfying 


A  b+  y/P  -f  ac 

r  >  max{-, - - - }, 

2.  a 


(7.9) 


where  o  =  b  =  and  c  =  (1/2)  ■ 

Remark  7.2.1 

The  steady  state  solutions  of  the  Riccati  equations  in  (7.6)  and  (7.7)  can  be 
found  using  the  matrix-sign  function  techniques  [9,23],  and  a  brief  review  of  this  is 
given  in  the  Appendix.  ■ 

7.2.1  Continuous-time  Design  Procedure 
Step  1. 

Let  the  given  continuous-time  system  be  as  in  (7.1).  Specify  h  so  that  the  —h 
vertical  line  on  the  negative  real  axis  would  represent  the  line  beyond  which  the 
eigenvalues  have  to  be  placed  in  the  sector  of  Fig.  7.1.  Also,  assign  Aq  =  A  and  the 
positive-definite  matrix  R.  Set  i  =  1.  If  the  system  is  unstable,  then  solve  (7.6)  to 
obtain  the  closed-loop  system  Aj  =  .4  —  rQBR~^ B^ Pq  =  A  —  VoBKq,  with  =  1; 
else  (stable  system)  go  to  Step  2  with  Aj  =  .4,  Pq  =  0,^  and  tq  =  0. 

Step  2. 

Solve  (7.7a)  for  Qi  with  A  :=  A^.  Check  if  |  trlBR~^  B^Qij  is  zero?  If  it  is 
equal  to  zero,  go  to  Step  4  with  j  =  i;  else,  continue  and  go  to  Step  3.  Note  that 
when  I  tT[BR~^ B^Qi]  =  0,  all  eigenvalues  of  the  matrix  A,  lie  on  or  within  the 
open  sector  of  Fig.  7.1. 

Step  3. 
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Solve  (7.76)  for  Pi  with  A  :=  Ai  and  Q  :=  Qi.  Then,  the  constant  gain  can 
be  evaluated  using  (7.9).  The  closed-loop  system  matrix  is 

Ai+i  =  Ai  —  riBR~^  Pi  =  Ai  —  TiBKi.  (7.10a) 

Set  t  ;=  t  1,  and  go  to  Step  2. 

Step  4. 

Check  if  tr  [{Aj  +  hln)]'^  (sum  of  the  eigenvalues  to  the  right  of  the  vertical 
line  at  —h)  is  zero?  If  it  is  equal  to  zero,  go  to  Step  5  with  =  0„  and 

Tj+i  =  0;  else,  solve  (7.6)  for  with  A  :=  Aj  and  obtain  the  closed-loop  system 
Aj  -rj+iBR~^B^Pj+i  =  Aj—Vj+iBKj+i,  with  ry+j  =  1  and  Kj+i  =  R~^B^Pj+i. 

SU  p  5. 

The  designed  closed-loop  system  is 

Ao-5i2-'B^^rfcP*,  (7.106) 

*=o 


and  its  eigenvalues  lie  in  the  hatched  region  of  Fig.  7.1.  Note  that  the  above  system 
matrix  in  (7.106)  is  equal  to  the  system  matrix  in  (7.5),  A  —  BR~^  B^ P,  where  P 
is  the  solution  of  the  Riccati  equation  in  (7.4)  with 

j 

Q  =  2h{P,  +  P^+i)  +  Y.^Qi  +  ^riP,BR-^B'^Pi)Ti.  (7.10c) 

t=i 

In  the  above  equation,  Ar^  =  —  1,  and  the  matrix  R  is  as  originally  assigned. 

Also,  the  optimal  continuous-time  regulator  can  be  given  as 

j+i 

u{t)  =  —(^^riKi)x{t)  -r  r(t)  =  —Kx{t)  -I-  r(t),  (7.10d) 

»=o 


where  r{t)  is  any  reference  input  and  K  is  the  desired  state-feedback  gain.  ■ 

7.3  Pseudo-continuous-time  Pole-placement  Regulators 

In  this  section,  the  block  decomposition  of  a  large-scale  discrete-time  system  is 
considered  first.  In  this  context,  the  method  based  on  the  matrix-sign  function  [9] 
for  block-diagonalizing  a  large-scale  discrete  system  into  a  multi-time  scale  struc¬ 
ture  is  discussed.  Then,  some  of  the  existing  model-conversion  methods  [28,68]  for 
transforming  a  continuous-time  (discrete-time)  model  to  an  equivalent  discrete-time 
(continuous-time)  model  are  reviewed.  Finally,  a  pseudo-continuous-time  state- 
space  method  for  determining  pole-placement  digital  regulators  for  eigenvalue- 
placement  in  a  specific  region  (Fig.  7.2)  is  considered. 
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7.3.1  Block-diagonalization  Via  Matrix-sign  Function 

The  definition  of  and  an  algorithm  to  compute  the  matrix-sign  function  are 
given  in  the  Chapter  4.  In  the  following,  the  results  leading  to  the  decomposition 
of  a  discrete  system  into  a  multi-time  scale  structure  are  presented. 

Lemma  7.3.1  [53] 

Consider  a  discrete-time  system  matrix,  G  €  The  mapping  h{G)  = 

{G  —  pIn){G  pln)~^  1  det  [G  4-  pin]  /  0,  maps  the  circle  of  radius  p  in  the  discrete 
z-plane  onto  the  imaginary  axis  of  the  Al2)-plane  and  the  interior  (exterior)  of  the 
circle  into  the  open  left-h^f  (open  right-nM)  h(2)-plane.  ■ 

Definition  7.3.1  [53] 

Let  the  eigenvalues  of  a  discrete- time  stable  system  matrix,  G  €  be 

Xi,i  =  1, . . .  ,n.  The  nondominant  modes  of  this  system  are  the  modes  with  |Ai|  <  p, 
where  p  is  a  positive  real  number,  while  the  dominant  modes  are  those  having 
|Ai|  >  p,  where  |(.)|  represents  the  absolute  value  of  (.).  If  the  eigenvalues  of  the 
original  system  are  unknown,  as  in  the  case  of  a  large-scale  system,  the  poitive  read 
number  p  can  be  chosen  as  p  =  |  ^/det(G)),  which  is  the  geomteric  mean  of  the 
eigenvalues  of  G.  If  the  given  system  G  is  unstable,  then  we  choose  p  =  1.  □ 

Theorem  7.3.1  [37] 

Let  G  €  and  |(<r(G))|  D  {pi,i  =  0,1,..., A:}  =  0,  where  (r{G)  represents 

the  eigenspectrum  of  G,  pi  £  Tt,  i  =  0, 1, . . . ,  fc  represent  radii  of  circles  concentric 
with  the  unit  circle.  Let  a  set  of  matrix-sign  functions  (see  Chapter  4)  be 

Sign(^,)  (M<^))  =  Sign  [(G  -  pi/„)(G  -J-  Pi/n)"']  for  t  =  0, 1, . . . ,  A:.  (7.11a) 

Define 

5,  ^  ind  [Sign+^_^,^^)  (MG))]  €  1  <  t  <  A:,  (7.116) 


where  ind(.)  represents  the  collection  of  the  linearly  independent  column  vectors  of 
(.),  and 


Sign 


+ 


(MG))  =  (MG))  -  Sign,„,,  (MG))] 


(T.llc) 


with  po  =  0,  and  sign(o)  (^(G))  =  In-  Assume  that  nj  ^  0  for  1  <  t  <  A:.  Then,  we 
have 

Gh  =  Gil/,  =  block  diag  [G^fc,  G/?()t_]), . . . ,  Gr]  ] ,  (7.12a) 

where  il/,  is  the  right  block-modal  matrix  given  by 

M,  =  [5*,St_,,...,5:],  (7.126) 

and 

GRi  =  S^GSi£n^'^^'  [oil<i<k.  (7.12c) 
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is  the  left  inverse  of  Si  and  is  defined  as  Si)  ^S^. 


7.3.2  Model  Conversions 

Consider  the  system  governed  by  the  continuous-time  state  equation  (aa  in 
(7.1)),  i.e., 

i{t)  =  Ax{t)  +  Bu(i),  x(0).  (7.13) 

If  we  approximate  u{t)  as  a  piecewise  input  function, 

ti(t)  =  u{kT)  for  kT  <  i  <  {k  +  1)7',  (7-14) 

where  T  is  the  sampling  period,  then  we  can  write  the  equivalent  discrete-time 
model  as 

x{k  -I-  1)  =  Gx{k)  -t-  Hu{k),  *(0),  (7.15a) 

where 

G  =  exp(>17’)  and  H  =  [G  -  (7.156) 

If  the  input  function  u(t)  is  not  a  piecewise  constant,  a  better  formulation  of  the 
input  matrix  H  can  be  obtained  according  to  the  nature  of  u{t).  In  general,  the 
matrices  G  and  H  can  be  determined  exactly  from  the  matrices  A  and  B,  and  the 
input  function  u(t)  in  (7.14)  using  the  eigenvalue  and  eigenvector  approach  [68]. 
However,  for  computational  purposes,  approximations  are  required  for  obtaining  G 
and  H  matrices  without  involving  the  eigenvalues  explicitly.  There  are  a  number 
of  methods  available  [18]  to  evaluate  approximately  G  and  H  given  in  (7.15).  The 
simplest  one  of  them  is  the  truncation  of  the  infinite  series  of  exp(i47’)  [68]  which 
results  in  a  good  approximation  when  J’  1.  A  popular  method  for  determining 
G  and  H  approximately  is  the  Fade  approximation  method  [28,68].  Some  of  the 
approximations  obtained  using  this  method  are  listed  below, 

G::(/„-^.4r|-'l/,+  i.4r|  =  G,  (7.16a) 

~  lu  - \aT  +  +  \aT  +  i(Ar)=l  ^  G,  (7.166) 

and 

H  ^T[In-\AT]-^B  =  Hz  (7.17a) 

~  T[u  - \aT  +  i(.4r)=]-'B  ^  H,.  (7.176) 

It  can  be  noted  that  the  matrices  Gz  in  (7.16a)  and  Hz  in  (7.17a)  correspond  to 
the  popular  Tustin  approximation  (bilinear  transformation)  [70] .  The  matrices  Gz 
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and  ^5,  when  used  with  even  large  sampling  periods,  provide  good  approximations. 
The  use  of  scaling  and  squaring  method  [68]  as  shown  below,  along  with  one  of  the 
above  approximations,  would  result  in  better  approximations, 

G  ~  ,  m  is  a  power  of  two.  (7-18) 

Now,  given  a  discrete-time  model  as  in  (7.15),  an  equivalent  continuous- time  model 
in  (7.13)  can  be  obtained  by  using  the  following  equations, 

/I  =  I  In  (G),  and  5  =  A[G  -  (7.19) 

As  before,  the  matrix  A  can  be  obtained  from  its  discrete  equivalent  G  exactly  by 
using  the  eigenvalue  and  eigenvector  approach.  It  can  also  be  obtained  approxi¬ 
mately  by  truncating  the  infinite  power  series  of  the  matrix-logarithm  function  ln( 
G),  subject  to  certain  convergence  conditions.  Shieh  et  td.  [28]  have  proposed  a 
direct- truncation  method  and  a  matrix  continued-fraction  method  for  determining 
A  from  G.  The  commonly  used  approximation  for  ^  In  (G),  obtained  using  the 
matrix  continued-fraction  method,  is 

A  =  iln  (G)  (7.20) 

where  iZ  =  [G  —  /n][G  +  The  matrix-series  approximations  obtained  from 

truncation  or  continued  fractions  converge  when  Re  ((t(G))  >  0,  where  <r(G)  rep¬ 
resents  the  eigenvalues  of  G.  In  general,  the  eigenvalues  of  the  matrix  G  are  not 
available,  and  they  do  not  always  lie  in  the  right-half  of  the  complex  z-plane.  In 
order  to  satisfy  the  convergence  condition,  the  principal  gth  root  of  the  matrix  G 
[28,29,61]  can  be  made  use  of.  Shieh  et  aJ.  [29]  and  Tsai  et  ai.  [61]  have  recently  de¬ 
veloped  a  fast  and  stable  algorithm  for  computing  the  principed  gth  root  of  a  general 
complex  matrix.  This  is  listed  in  Chapter  3.  The  eigenvalues  of  ^G  lie  in  the  right- 
half  of  the  complex  z-plane,  i.e.,  Re  (<r(yG))  >  0,  for  q  >  2.  Therefore,  instead  of 
G  the  principal  qth  root  of  G  can  be  used  in  determining  an  approximation  for  A. 
In  this  case,  the  matrix  equation  (7.19)  becomes 

4  =  i  In  (G)  =  I  In  (VG).  (7.21) 

As  a  result,  the  matrix  R  in  equation  (7.20)  would  become  R  :=  [^G  —  In][VG  -f 
/„]~\  and  the  constant  factor  2/r  would  be  replaced  by  ^q/T.  The  condition  for 

the  convergence  of  the  power  series  of  In  (^^G)  becomes  arg  (o’(G))  ^  tt  and  det 
(G)  0,  which  is  a  much  less  restrictive  condition. 

7.3.3  Pseudo-continuous-time  Multi-stage  Design  Procedure 

Let  the  given  large-scale  discrete-time  svstem  with  appropriate  sampling  period 

T  be 

x(fc  -I-  1)  =  Gx(k)  -f-  ffu(k),  x(0).  (7.22) 

85 


859 


Also,  let  the  dimension  of  the  system  be  n  and  the  number  of  inputs  be  m.  The 
objective  is  to  first  decompose  the  system  into  a  multi-time  scale  structure,  using 
techniques  based  on  the  matrix-sign  function,  then  design  each  decomposed  subsys¬ 
tem  using  model  conversions  and  with  eigenvalue  placement  in  the  hatched  region 
of  Fig.  7.2,  and  finally  determine  the  chgitaj  regulator  for  the  whole  large-scale 
system. 

Step  1. 

Set  i  =  1,  G  :=  G,  H  ~  S  and  the  feedback  g8un  K  =  O^xn- 
Step  2. 

Now,  specify  a  positive  real  scalar  pi  (see  Definition  7.3.1)  and  find  a  trans¬ 
formation  matrix  such  that  the  matrix  G  can  be  block-diagonalized  into  the 
following  form, 

G  :=  =  block  diag  [Gc,Gi,Gi],  (7.23a) 

where  Gc  €  represents  a  block,  which  has  already  been  designed  or 

does  not  need  to  be  designed,  and  the  matrices  G^  6  and  G,  G  ,  with 

rii  =  Hi  +  hi,  contain  eigenvalues  less  than  and  greater  than  (in  absolute  value)  pi, 

respectively.  The  transformation  matrix  is  given  by 

-  block  diag  (52, 5j )j ,  (7.236) 

where  5i  €  and  52  €  are  as  defined  in  (7.11)  with  respect  to  the 

matrix-sign  function  of  the  matrix  Gi,  where  Gj  :=  block  diag  [Gi,Gj],  i  >  1  and 
Gi  :=  G,  i  =  1.  Using  M['K  transform  H  as 

H  :=  (7.23t) 

The  dimensions  of  the  matrices  Hey  Hi  and  Hi  are  (n  —  nj)  x  m,  hi  x  m  and  hi  x  m, 
respectively.  Accumulate  the  transformations  in  := 

Step  3. 

The  subsystem  considered  for  design  at  this  stage  is  {Gi,Hi).  Transform  the 
above  discrete-time  system  into  an  equivalent  continuous-time  system,  (Ai,Bi), 
using  the  principal  gth  root  techniques  [29.61]  and  apply  the  design  procedure  given 
in  Section  7.2  to  design  this  continuous-time  system.  Let  the  immmediate  optimal 
closed-loop  continuous-time  system  be  (Ac,  ;5i). 

Step  4. 

Transform  the  designed  continuous-time  system  into  an  equivalent  discrete¬ 
time  system,  (Gc. ,^ci)t  using  techniques  discussed  earlier  in  this  section. 

Step  5. 

If  Hi  is  invertible  (nonsingular),  then  the  discrete-time  feedback  gain  for  this 
design  stage  is  given  by 
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Ki  =  (Bi)-'  (Gi-e„). 


(7.24) 


The  dimension  of  Ki  is  m  x  fii.  When  .ff,-  is  non-square,  then  the  feedback  gain 
can  be  found  through  appropriate  coordinate  transformations  [69]  and  other 
manipulations  [65,70]. 

Step  6. 

Update  the  feedback  gain  K  and  the  system  matrix  G,  respectively,  as 

R-.=  K  +  [0,„,,(„_ft,),  A'i](A/^'>)-^  (7.25) 


G  :=  G  -  .ff[0„ix(n_ft.),.^i]  = 


^Ci  j 


(7.26) 


where  G,  =  block  diag  [Gc,Gi],  TU,-  =  and  the  dimensions  of  the 

matrices  Gi  and  IVi  are  (n  —  nj  x  (n  —  fij)  and  (n  —  fij)  x  hj,  respectively. 

Step  7. 

Block-diagonaJize  the  partially  designed  system  G  and  move  the  last  block  of 
G  in  (7.26)  (viz.,  Gc, )  to  the  first,  via  a  transformation  matrix  which  is  given 


1/(0  _ 

~  T  n  /  V  ’  f  ~  T  T  ' 


(7.27a) 


The  matrix  Li  (€  7^^"  can  be  solved  from  the  following  Lyapunov  equation 

[37],  [58]-[60], 


GiLi  LiGc,  +  Uj  —  • 


(7.275) 


The  transformed  svstem  is 


.®(n— Gi 

H  :=  =  [Hi, {Hi  -  LiHiff 


(7.28a) 


(7.285) 


where  Hi  =  [Hj ,  Hf]^ .  Accumulate  the  transformations  in  := 

Step  8. 

Set  i  ;=  t  4-  1.  If  i  >  (fc  is  the  number  of  time-scales),  then  stop;  else,  go  to 
Step  2. 

The  digital  regulator  is 

u(fc)  = -Kx(fc) r(fc)  (7.29) 
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with  r(fc)  as  any  reference  input,  would  place  the  eigenvalues  of  the  system  in 
(7.22)  within  the  hatched  region  of  Fig.  7.2.  Also,  when  the  sampling  period  T  is 
sufficiently  small,  the  digital  regulator  can  be  considered  as  a  suboptimal  discrete¬ 
time  regulator  because  of  the  approximations'involved  in  the  inputs  and  the  various 
model  conversions,  although  the  equivalent  continuous-time  regulator  is  optimal.  ■ 

7.4  Illustrative  Example 

Consider  an  unstable  discrete-time  system  in  (7,22)  with 


■ -0.357 

-0.657 

-0.146 

0.119 

-0.041- 

1.675 

-0.460 

0.335 

0.000 

0.167 

-0.075 

0.146 

0.360 

-0.593 

0.009 

0.376 

0.263 

0.518 

0.280 

0.016 

.-0.882 

0.252 

-0.176 

0.000 

-0.070. 

■  0.689 

0.283- 

0.240 

-0.387 

-0.339 

0.332 

0.063 

0.020 

.-0.126 

0.268. 

(7.30a) 


and  (r((5)  =  {-0.46±;1.005,  0-3276  ±  jO.SlOS,  0.0179}  for  T  =  0.5. 

The  location  of  the  poles  of  G  in  the  discrete  r-plane  are  shown  in  Fig.  7.2 
and  it  is  seen  that  except  for  the  one  at  0.0179,  which  is  to  be  kept  in\'ariant, 
the  rest  of  the  poles  lie  outside  the  region  of  interest.  The  objective  is  to  design 
the  discrete-time  system  in  (7.30a)  with  multi-time  scale  decomposition  and  pole- 
assignment  within  the  specified  region  in  the  z-plane.  The  pseudo-continuous-time 
design  procedure  given  in  Section  7.3.3  will  be  used  to  achieve  the  desired  design. 

Since  the  given  system  is  unstable,  the  first  step  is  to  block  decompose  the 
system  into  its  stable  and  unstable  parts.  Assign  pi  =  1  (represents  the  unit  circle). 

The  transformation  matrix  found  using  the  matrix-sign  function  technique 

given  in  Section  7.3.1,  which  block-diagonalizes  G  is  given  by 

=  [52, 5j] 


■ 

/  -0.0464 

-0.0551 

-0.2090 

/  1.0463 

0.0551 

• 

r-  0 

u.uuuo 

-0.0002 

O.uuuj 

-0.0003 

1.0002 

0.2325 

0.0124 

1.0464 

-0.2325 

-0.0124 

0.4189 

0.0222 

0.0837 

-0.4189 

-0.0222 

* 

\  0.0002 

0.5269 

0.0000/ 

\  -0.0002 

-0.5269/ 

where  S2  €  and  5i  €  can  be  found  from  (7.11).  The  transformed 

matrices,  using  Mi,  corresponding  to  G  and  H  in  (7.30a),  are 

G  :=  (A/J*’)-^GMJ^’  =  block  diag  [Gi,Gij 
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0.5827 

-0.0002 

-0.2835 


-0.0002 

0.0179 

0.0000 


1.1480 

-0.0001 

0.0726 


(1: 


(7.31a) 


-0.4601  -0.6027 

1.6744  -0.4601 


0.8464  0.3328 
0.0007  0.1218 
-0.3739  0.3208 
0.6087  0.3761 

0.2400  -0.3870 


(7.316) 


The  eigenspectra  of  the  diagonal  blocks  in  (7.31a)  are  <r{G)  =  {0.3276  ±70.5103, 
0.0179}  and  o‘((5)  =  {—0.46  ±  jl.005}.  The  unstable  subsystem  ((5i,.ff^i)  is  to  be 
designed  at  this  stage.  The  equivalent  continuous-time  subsystem  is  found  using 
the  principal  qth.  root  of  Gj  {q  =  4)  (the  algorithm  in  Chapter  3).  Note  that  since 
the  eigenvalues  of  Gj  are  in  the  right-half  z-plane,  the  well-known  bilinear  trans¬ 
formation  for  model  conversion  will  not  converge.  The  continuous-time  subsystem 


A,  = 


0.1996  -2.4000 

6.6679  0.1997 


0.9993  -0.0003 
-1.6667  -1.6648 


(7.32a) 


with  <r(A])  =  {0.1996  ±  4.0006}.  Assign  h  =  1.1,  i.e.,  the  eigenvalues  of  the  closed- 
loop  system  should  lie  to  the  left  the  vertical  line  at  -1.1  on  the  negative  real  axis 
in  the  a-plane,  and  R  —  I2.  To  achieve  the  necessary  design,  we  follow  the  steps  of 
the  continuous-time  design  procedure  in  Section  7.2.1.  Let  A  =  Ai  and  B  =  B\. 
Solving  the  Riccati  equation  in  (7.6)  with  (A  +  hl2,B),  we  have 


Po  = 


2.250  -0.038 

-0.038  0.509 


Ko^R-^BJPo  = 


2.311  -0.887 

0.062  -0.848 


(7.326) 


The  rcGuIt'iri^"  cli.ij.rd-iofju  hysiem  is 


Aj  =  A  —  BKq  = 


-2.110  -1.515 

10.623  -2.690  ’ 


(7.32c) 


and  <t(Ai)  =  {-2.3996  ±  74. 0006}.  Note  that  |(Re  (r(Ai))|  >  1.1.  Now,  solving  the 
Riccati  equation  in  (7.7a)  with  (  — Aj,5),  we  have 


Q,  = 


31.469  1.284 

1.284  2.494  ’ 


and  (1/2)  tr  [BR-^B'^Qi]  =  20.49  0.  (7.32d) 
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Solving  the  Riccati  equation  in  (7.7b)  with  {Ai,B)  and  Qi,  we  obtain 

'  3.968  -0.471' 

-0.123  -0.543 


Pi  = 


4.093 

0.073 


0.073 

0.326 


Ki  =  = 


(7.32e) 


From  (7.9),  the  constant  gain  rj  =  0.6385.  Therefore,  the  closed-loop  system  is 


A2  =  Ai  —  r-BKi 


-4.641  -1.214 

14.714  -3.768  ’ 


Q2  —  O2, 


(7.32/) 


and  (7(^2)  =  {— 4.2046±j4.2046}.  Note  that  all  the  eigenvaJues  lie  on  the  boundary 
of  the  hatched  region  in  Fig.  7.1,  tr  [{A2-\-hl2)^]  =  0  and  (1/2)  tr  [BR~^  B^Q2]  =  0, 

where  Q2  solved  from  (7.7a)  with  respect  to  {—A\^B).  This  verifies  that  the  desired 
design  has  been  achieved  for  the  subsystem  in  (7.32a).  Let  us  denote  this  closed-loop 
subsystem  by  =  Ai  —  B\{K a  -|-riA'i).  Now,  we  transform  this  continuous-time 

system  into  its  equivalent  discrete-time  system,  G^i ,  given  by 


Gel 


-0.0729  -0.0304 

0.3686  -0.0510 


(7.33a) 


The  eigenspectrum  corresponding  to  this  system  matrix  is  (t{Gci)  —  {  —  0.0619  ± 
j0.1053}.  Note  that  this  complex  conjugate  pair  is  inside  the  hatched  region  of  Fig. 
7.2.  The  discrete-time  feedback  gain  for  this  stage  is 


1.047 

-2.725 


-1.152 

0.343 


(7.336) 


Using  this  feedback  gain,  the  updated  system  is  given  by 


G  G  —  /f[02x3>.fti] 


IVi 

^3x2  Gel 


/  0.5827 

-0.0002 

1.1480\ 

/  0.0204 

0.8610  \ 

-0.0002 

0.0179 

-0.0001  j 

^0.3311 

-0.0410  J 

\  -0.2835 

0.0000 

0.0726  / 

/ -0.0729  -0.0304  \ 

\  0.3686  -0.0510  / 


The  updated  feedback  gain  K  is 

^:=A'4-:02x3.^i](.UrV 


(7.33c) 


1.047  -1.152  0.209  0.000  0.104] 

-2.725  0.343  -0.544  0.000  -0.272 J  ' 


(7. 33d) 
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The  solution  of  the  Lyapunov  equation  in  (7.276)  for  t  =  1  and  n,-  =  2  is 


2.893  -2.029 
Zi  =  -0.449  0.787 

-2.322  0.293 


(7.33e) 


Thus  the  transformation  niatrix  that  block-diagonalizes  G  in  (7.33c)  and  swaps 
the  blocks  and  Gi  is  given  by 


(1)  L,  h 

h  02x3  ■ 


(7.33/) 


The  transformed  system  is  now  given  by 

G  :=  = 


Gel  02x3 
03x2  Gj 


(7.34a) 


H  :=  = 


'  0.6087  0.3761  > 

^0.2400  -0.3870  y 
-0.4280  -1.5404 

0.0853  0.5952 

0.9688  1,3074 


(7.346) 


The  accumulated  transformation  becomes  :=  Jl/j*  . 

Now,  we  proceed  to  the  second  stage  of  design  which  consists  of  designing 
the  stable  dominant  poles  of  the  original  discrete-time  system  in  (7.30a).  Choose 
P2  =  e“^^  =  0.57695.  The  transformation  matrix  A/j  which  block-diagonaJizes  the 


p2  =  e 


block  G\  in  (7.33c)  while  preserving  the  block  G^i  is  given  by  (as  in  (7.236)) 


_  h  02x3 

^  "  03X2  (52,  5,) 


(7.34c) 


[52, 5i]  = 


(7.34d) 


where  S2  ^  and  5i  6  can  be  found  from  (7.11)  with  respect  to  Gj  and 

P2.  The  transformed  matrices  G  and  H  are 


:=  (A/i^’)-^GA/J*’  =  block  diag  [Gci,G2,G2] 
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-0.0729  -0.0304 

0.3686  -0.0510 

0lx2 


(0.0179) 


(-S: 


0lx2 

5827  1.1480' 

2835  0.0726 


(7.35a) 


H  :=  = 


/  0.6087  0.3761  \ 

\  0.2400  -0.3870/ 

( 0.0847  0.5943 ) 

/ -0.4279  -1.1540  \ 
V  0.9687  1.3073  J 


(7.356) 


Again,  the  accumulated  transformation  becomes  :=  The  subsys¬ 

tem  to  be  designed  in  this  stage  is  {G2,H2).  Following  the  same  procedures  as  in 
the  first  stage,  we  obtained  the  designed  discrete  subsystem  as 


Ge7  = 


0.4272  1.1292 

-0.1513  -0.1608 


(7.35c) 


with  ^■(Gcj)  =  {0.1332  ±  j0.2905}.  Again,  note  that  these  eigenvalues  are  within 
the  hatchea  region  of  Fig.  7.2.  The  discrete-time  feedback  gain  for  this  stage  is 


K2={S2)-'{G2-G,2)= 


0.0000 

0.1008 


0.4117 

-0.1265 


(7.35d) 


The  updated  feedback  gain  is 


^;=^  +  [03x2,A'2](MrO 


2.000  -1.274  0.812  -0.229  0.199 

-2.828  0.175  -0.671  -0.181  -0.272 


(7.35e) 


The  eigenvalues  of  G  —  Ar[02x3>  •^^2]?  with  G  and  If  as  in  (7.35a)  and  (7.356), 
are  I  — 0.0619  ±j0. 1053,  0.1332  ±j0. 2905,  0.0179}.  Note  that  all  of  them  are  within 
the  hatched  region  of  Fig.  7.2,  and  the  nondominant  eigenvalue  of  the  open-loop 
system  at  0.0179  is  not  designed.  Therefore,  the  closed-loop  discrete-time  system  is 


Gc  =  G-  HK 


866 


(7.36a) 


0.9374 

0.1709 

-0.5157 

0.3282 

-0.1013-1 

0.0997 

-0.0864 

-0.1197 

-0.0151 

0.0139 

1.5432 

-0.3440 

0.8582 

-0.6105 

0.1669 

0.3063 

0.3397 

0.4803 

0.2981 

0.0089 

0.1284 

0.0445 

0.1062 

0.0197 

0.0280  J 

The  pseudo-continuous-time  pole-placement  regulator  is  given  by 

u{k)  = —Rx{k)  +  T{k),  (7.366) 


where  K  is  the  total  feedback  gain  as  in  (7.35e)  and  r{k)  is  any  reference  input. 
7.5  Conclusion 

The  design  of  large-scale  discrete-time  systems,  which  do  not  exhibit  a  two- 
or  multi-time  scale  structure  explicitly,  has  been  considered  in  this  chapter.  It  has 
been  shown  that  a  large-scale  discrete  system  can  be  decomposed  into  a  completely 
decoupled  multi-time  scale  structure  (block-diagonalization)  using  the  techniques 
based  on  the  matrix-sign  function,  witnout  explicitly  utilizing  the  open-loop  eigen¬ 
values  of  the  given  system.  A  pseudo-continuous-time  state-space  method,  based  on 
model  conversions,  has  been  developed  for  methodically  designing  each  subsystem 
(corresponding  to  one-time  scsJe),  with  eigenvalue-placement  in  a  desired  region  of 
the  complex  z-plane.  The  model  conversions  and  various  other  computations  can 
be  achieved  using  fa^t  and  stable  algorithms  based  on  the  princip^  9th  root  of 
the  system  matrix  and  the  matrix-sign  functions.  When  the  sampling  period  T  is 
sufficiently  small,  the  designed  discrete  controller  is  suboptimal  while  its  associated 
continuous-time  controller  is  optimal  with  respect  to  certain  weighting  matrices. 
The  proposed  method  requires  the  solution  of  Riccati  equations  of  small  order  only 
at  each  stage  of  the  design.  Transformation  to  genered  canonical  form  so  as  to  de¬ 
termine  the  discrete  feedback  gain  can  be  avoided  in  most  cases.  The  developed 
state-space  method  can  be  used  to  design  multivariable  digital  control  systems, 
for  determining  the  state-feedback  pole- placement  controllers;  whereas,  the  exist¬ 
ing  pseudo-continuous-time  frequency-domain  method  [71]  can  only  be  applied  to 
design  single- variable  digital  control  systems  for  obtaining  the  cascaded  controllers. 
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The  region  of  imcresl  in  Uie  continuous-time  s-plane. 


Fig.  7.2  The  region  of  interest  in  the  discvete-limez-plane. 


Chapter  8 
Conclusions 


A  complete  study  of  the  principal  nth  root  of  a  complex  matrix  and  associated 
matrix-valued  functions  is  presented  in  this  research  monograph.  This  includes  the 
development  of  techniques  to  compute  the  principal  nth  root  of  a  matrix,  study  of 
associated  matrix- valued  functions,  and  their  applications  to  mathematical  sciences 
and  control  systems. 

In  Chapter  2,  the  generalized  continued-fraction  method  developed  for  finding 
the  nth  roots  of  real  numbers  has  been  extended  to  determine  the  principal  nth 
roots  of  complex  matrices.  Computational  algorithms  with  high  order  convergence 
rates  have  been  established  for  determination  of  the  principal  nth  root  and  the 
associated  pth  power  of  the  principal  nth  root  of  a  complex  matrix.  The  global 
convergence  properties  of  the  proposed  algorithms  have  been  investigated  from  the 
viewpoint  of  systems  theory. 

Rapidly  convergent  and  more  stable  recursive  algorithms  for  finding  the  princi¬ 
pal  nth  root  of  a  matrix  have  been  developed  in  Chapter  3.  The  developed  recursive 
algorithms  can  be  applied  to  an  ill-conditioned  matrix  conttuning  large  and  small 
eigenvalues.  By  means  of  a  perturbation  analysis  with  suitable  assumptions,  it  is 
shown  that  the  proposed  recursive  algorithms  are  numerically  more  stable  than  the 
algorithms  in  [20,21,26].  The  analysis  of  absolute  numeric^  stability  of  the  pro¬ 
posed  algorithms  has  not  been  done  in  this  research  monograph.  The  developed 
algorithms  will  enhance  the  capabilities  of  the  existing  computational  algorithms 
such  as  the  principad  nth  root  algorithm,  the  matrix-sign  algorithm  and  the  matrix- 
sector  zdgorithm  which  in  turn  can  be  applied  to  many  control-system  problems. 

In  Chapter  4,  the  matrix-sector  function  of  A  has  been  generalized  to  the 
matrix-sector  function  of  g(A).  Based  on  the  computationally  fast  and  numerically 
stable  algorithms  for  computing  the  principeil  nth  root  of  a  matrix,  fast  and  stable 
algorithms  for  computing  the  matrix-sector  function  and  the  generalized  matrix- 
sector  function  have  beed  developed.  The  generalized  matrix-sector  function  of  A 
has  been  utilized  to  carry  out  the  separation  of  matrix  eigenvalues  relative  to  a 
sector,  circle  and  a  sector  of  a  circle  in  the  A-plane.  Also,  the  generalized  ma¬ 
trix  sector  function  of  A  has  been  employed  for  block-diagonalization  and  block- 
triangularization  of  the  svstem  matrix,  which  are  useful  in  developing  applications 
to  mathematical  science  [32]  and  control-system  problems  [31]. 

New  computational  methods,  which  utilize  the  direct-truncation  method,  the 
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matrix  continued-fraction  method,  and  the  geometrix- series  method  together  with 
the  principal  qih.  root  of  a  discrete-time  system  matrix  have  been  presented  in 
Chapter  5  for  quick  modeling  of  the  equivalent  continuous-time  state  equations  from 
the  discrete-time  state  equations.  The  proposed  method  is  useful  for  identifying  a 
continuous-time  system  based  on  the  observation  of  sampled  input-output  data  and 
for  design  of  sampled-data  control  systems. 

The  amplitude  and  phase  of  a  complex  matrix  and  the  projected  real  and  imag¬ 
inary  parts  of  the  complex  matrix  have  been  defined  and  computational  methods  for 
finding  the  above  matrices  have  been  proposed  in  Chapter  6.  By  utilizing  the  impor¬ 
tant  property  of  the  matrix-sign  function  that  the  associated  matrix-sign  functions 
of  a  shifted  complex  matrix  preserve  the  eigenvectors  of  the  original  matrix,  the 
algorithm  for  finding  the  principal  nth  root  of  a  complex  matrix  has  been  employed 
for  computing  the  amplitude  and  phase  of  the  original  complex  matrix.  The  newly 
developed  geometric-series  method  can  be  utilized  for  finding  the  approximation  of 
the  matrix- valued  function,  tan~^(A’'),  where  A'  is  a  matrix.  Questions  of  compu¬ 
tational  cost  have  not,  however,  been  considered  in  any  detail.  The  applications  of 
the  developed  amplitude  and  phase  of  a  complex  matrix  to  systems  theory  [32]  are 
being  investigated. 

The  design  of  large-scale  discrete-time  systems,  which  do  not  exhibit  a  two- 
or  multi-time  scale  structure  explicitly,  has  been  considered  in  Chapter  7.  It  has 
been  shown  that  a  large-scale  discrete  system  can  be  decomposed  into  a  completely 
decoupled  multi-time  scale  structure  fblock-diagonalization)  using  the  techniques 
based  on  the  matrix-sign  function,  witnout  explicitly  utilizing  the  open-loop  eigen¬ 
values  of  the  given  system.  A  pseudo-continuous-time  state-space  method,  based  on 
model  conversions,  has  been  developed  for  methodically  designing  each  subsystem 
(corresponding  to  one-time  scale),  with  eigenvalue-placement  in  a  desired  region  of 
the  complex  2-plane.  The  model  conversions  and  various  other  computations  can 
be  achieved  using  fast  and  stable  algorithms  based  on  the  princip^  qth  root  of 
the  system  matrix  and  the  matrix-sign  functions.  When  the  sampling  period  T  is 
sufficiently  small,  the  designed  discrete  controller  is  suboptimal  while  its  associated 
continuous-time  controller  is  optimal  with  respect  to  certain  weighting  matrices. 
The  proposed  method  requires  the  solution  of  Riccati  equations  of  small  order  only 
at  each  stage  of  the  design.  Transformation  to  general  canonical  form  so  as  to  de¬ 
termine  the  discrete  feedback  gain  can  be  avoided  in  most  cases.  The  developed 
state-space  method  can  be  used  to  design  multivariable  digitzd  control  systems, 
for  determining  the  state-feedback  pole- placement  controllers;  whereas,  the  exist¬ 
ing  pseudo-continuous-time  frequency-domain  method  [71]  can  only  be  applied  to 
design  single- variable  digital  control  systems  for  obtaining  the  cascaded  controllers. 
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Appendix  A 

Solution  of  Riccati  Equation  Via  Matrix-sign  Function 


The  Riccati  equation  for  the  controllable  continuous- time  system  {A,B)  with 
weighting  matrices  Q(>  0)  and  R{>  0)  is  given  by 


PBR~^B^P  -  A'^P  -  PA~  Q  =  0.  (A.lo) 

The  steady  state  solution  of  this  Riccati  equation,  P{^  0)  ^'dth  (Q,j4)  detectable, 
can  be  easily  computed  using  the  properties  of  the  matrix-sign  function  [9,23j. 
Consider  the  Hamiltonian  associated  with  the  given  system. 


■  A  -BR-^B^' 

-Q 


[A.lb) 


The  following  algorithm  can  be  utilized  to  obtain  the  solution  P , 


Hk+i  =  (1/2)  [Hk  +  ^0  =  H,  and 


lim  Hk  =  Sign(JEr).  {A.2a) 

fe— •oo 


Let 

Sign+  (Jy)  =  (l/2)i/2n  +  Sign(ir)]. 


Construct  a  block-modal  matrix  A'^  as 

A”  =  [ind  (Sign''"  {H)),  ind  {hn  -  Sign"''  (^))]  = 


Xu 

X21 


Xu 

X22 


[A.2b) 

(A.3a) 


where  ind(.)  represents  the  collection  of  the  linearly  independent  column  vectors  of 
(.).  Then,  we  have 

P  =  X22{Xu)-^  =  A^2i(.Yii)-'. 


To  alleviate  the  problems  of  computing  ^ ,  the  Hamiltonian  can  be  transformed 
into  a  symmetric  form  as  follows  [23j, 


H  =  JH  = 


On 

In 


A^ 

-BR-^B'^ 


(A.4o) 
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Then,  the  jdgorithm  in  (A. 2)  becomes 

^*+1  =  (1/2)  [Hk  +  Ho  =  JH,  and 

lim  i-JHk)  =  Sign(fr).  (AAb) 

fc— »oo 

The  computation  of  the  inverse  of  the  symmetric  matrix  Hk  is  much  simpler  than 
computing  the  inverse  of  Hk.  The  Riccati  solution  P  is  again  given  by  (A.3). 
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Abstract 


In  this  paper  we  consider  a  generalization  of  the  singular  value  decompo¬ 
sition  (SVD)  that  involves  three  matrices.  We  show  how  the  decomposition 
can  be  used  in  important  applications  such  as  weighted  least  squares,  and 
present  a  new  computational  procedure  based  on  an  implicit  SVD  method  for 
a  triple  matrix  product.  Our  algorithm  is  well  suited  for  parallel  implemen¬ 
tation. 

Keywords:  Singular  value  decomposition,  weighted  least  squares,  Jacobi 
methods,  parallel  computing 


1.  Introduction 

In  this  paper  we  develop  a  new  algorithm  for  computing  the  HK-singular 
value  decomposition  (HK-SVD).  The  paper  is  organized  as  follows.  Section  1 
presents  a  description  of  the  problem,  its  relation  to  the  generalized  singular 
value  decomposition  (GSVD),  and  an  application  in  which  the  HK-SVD  pro¬ 
vides  a  powerful  solution.  What  follow  in  Section  2  are  an  implicit  algorithm 
for  computing  the  SVD  of  a  product  of  three  matrices  and  a  new  HK-SVD 
algorithm  in  which  the  implicit  method  is  embedded.  A  summary  and  some 
final  remarks  conclude  the  paper  in  Section  3. 

Notations.  We  make  the  standard  choice  to  represent  column  vectors 
by  bold  lower  case  roman  characters,  matrices  by  upper  case  roman  charac¬ 
ters,  and  scalars  by  either  greek  letters  or  roman  letters  with  subscripts,  as 
elements  in  vectors  and  matrices.  In  addition,  the  following  notations  are 
used: 
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On  xp—^  annxp  block  matrix  with  all  zero  elements 

Ip->  a  p-dimensional  identity  matrix 

a  2  X  2  matrix  formed  by  intersecting  rows  and  columns  i  and  i+1 
of  A 


1.1.  HK-SVD.  Given  three  matrices  A  ( n  x  p),  H  (nxn)  and  K 
ip  xp),  where  n  > p  and  both  H  and  K  are  symmetric  positive  definite,  we 
wish  to  find  two  transformations  Y  and  Z  such  that 

Y-^AZ=D  ,  (1.1) 

where 

Y'^HY  =  In  and  Z‘^KZ  =  Ip  , 

and  Din  x  p )  is  diagonal  ( Van  Loan  [10] ).  We  say  that  the  matrices  Y  and  Z 
are  if-orthogonal  and  iC-orthogonal,  respectively. 

A  straightforward  way  [10]  to  compute  the  HK-SVD  is  to  first  determine 
the  Cholesky  factorizations: 

H  =rJiRjj  ,  K  =  R^Rk  ,  (1.2) 

where  Rh  and  Rk  are  upper  triangular  matrices,  and  then  find  an  SVD  of  the 
product  RhARJ^  : 

U'^iRHARf)V  =  D  ,  (1.3) 

where  U  and  V  are  orthogonal,  and  D  is  diagonal.  The  two  transformations  Y 
and  Z  are  given  by 

Y  =  R-itU  ,  and  Z=RfV  .  (1.4) 

We  will  present  a  new  algorithm  for  finding  the  HK-SVD  via  equation  (1.3) 
without  any  explicit  matrix  inversions  or  product  formations. 


1.2.  Weighted  Least  Squares.  The  HK-SVD  is  useful  in  finding  the 
solution  to  a  weighted  least  squares  problem: 
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||Ax - b II =  min  s.t.  ||x||^  =  min  . 


(1.5) 


The  Af -vector  norm  is  defined  by 

IlyllJf^y^A^y 


where  M  denotes  a  symmetric  and  positive  definite  matrix.  We  may  reformu¬ 
late  the  problem  as 

||Ax-b||H=ll^ff(Ax-b)||2=||i2HAR^(i?ifX)-/?Hb||2  =  nun  , 
subject  to 

||J?ifx|j2  =min  . 


The  procedxire  is  to  compute  an  SVD: 

U'^iRnARfW  =  D  , 


(1.6) 


and  solve  the  simple  problem: 

|lDw-f||2  =niin  s.t.  ||w||2  =  nQin  ,  (1.7) 

where 

w  =  V^i2/:x  =  Z-^x  and  f  =  =  y^b  , 

with  Y  and  Z  as  defined  in  (1.4).  We  see  that  the  HK-SVD  provides  an  easy 
solution  to  the  weighted  least  squares  problem  (1.5). 


1.3  Previous  Work.  Our  new  algorithm  is  based  on  an  implicit  GSVD 
method  (Paige  [9]  and  Luk  [5]),  which  computes  an  SVD  of  a  p  x  p  product 
AB~^ ,  without  explicitly  forming  the  product  and  without  inverting  B. 

The  SVD  of  a  matrix  product  finds  applications  in  many  areas.  For 
instance,  it  can  be  used  in  control  theory  to  compute  system  balancing 
transformations  (cf.  Moore  [8],  Heath  et  al.  [3]  and  Laub  et  al.  [4]).  That  is, 
we  find  a  contragradient  transformation  P  to  diagonalize  two  given  sym¬ 
metric  positive  definite  matrices  A  and  B: 

P^AP=P-1BP-^  =  A  . 

The  transformation  thus  solves  the  generalized  eigenvalue  problem: 


ASx  =  . 


One  way  to  find  P  is  to  compute  the  Cholesky  decomposition  of  B,  i.e., 
B  =  R^Rb  ,  and  then  calculate  an  eigenvalue  decomposition  of  the  symmetric 
matrix 

U'^iRBARl)U  =  .  (1.8) 

We  get  the  transformation  as 

P=^RlUA-^'^  . 

Despite  the  similarity  of  equation  (1.8)  and  (1.3),  since  A  is  symmetric  posi¬ 
tive  definite  here,  we  may  compute  the  Cholesky  decomposition  of  A  =  R^Ra 
and  find  an  SVD  of  RaR^  [3],  [4].  For  equation  (1.3),  however,  A  is  not  sym¬ 
metric,  and  so  we  must  consider  an  SVD  of  three  matrices  even  when  H  =  K. 


2.  New  Algorithm 


In  this  section  we  derive  a  new  algorithm  for  finding  the  HK-SVD  via 
equation  (1.3).  First,  we  consider  the  special  product: 

EFG-^  ,  (2.1) 

where  E,  F  and  G  are  all  p  x  p  and  upper  triangular.  We  assume  further 
that  G~^  exists.  The  special  case  of  (2.1)  where  E  =  Ip  reduces  to  the  GSVD 
problem  for  the  two  matrices  F  and  G. 

In  a  Jacobi  SVD  algorithm  we  solve  a  sequence  of  2  x  2  problems  by 
finding  rotation  parameters  to  annihilate  off-diagonal  elements.  An  impor¬ 
tant  issue  is  the  order  of  elimination.  Luk  [6]  chooses  the  odd  -even  ordering 
and  outer -rotations  for  an  efficient  parallel  implementation.  The  conver¬ 
gence  of  this  scheme  has  been  proved  (Luk  and  Park  [7]),  and  the  algorithm 
implemented  on  a  massively  parallel  machine  (Ewerbring  and  Luk  [IJ,  [2]). 
We  define  the  odd  and  even  index  sets  by 

Odd-set  ->  {1,  3,  5,  •  •  •  ,  p  -  1 } 


834 


Even  -set  {2,  4,  6,  •  •  • ,  p  -  2  ) 
assuming  that  p  is  even.  For  an  odd p,  we  define 

Odd-set  {1,  3,  5,  •  •  • ,  p  -  2  } 

Even -set  -4  {2,  4,  6,  •  •  • ,  p  -  1 }. 


2.1.  GSVD.  The  GSVD  of  F  and  G  is  computed  via  an  SVD  of  the 
matrix 

C=FG-^.  (2.2) 

The  procedure  [5],  [9]  determines  orthogonal  transformations  U,  V  and  Q  so 
that  the  two  resulting  matrices  U^FQ  and  V^GQ  have  parallel  rows,  i.e., 

U'^FQ  =D-  V'^GQ  , 

where  D  is  some  diagonal  matrix.  We  can  easily  check  that 

U'^(FG-^)V  =  D  ,  (2.3) 


which  is  just  an  SVD  of  C. 

The  special  advantage  of  Luk’s  approach  [5]  is  that  it  preserves  the 
upper  triangular  structure  of  both  F  and  G.  Indeed,  the  two  matrices  G~^ 
and  C  are  also  upper  triang^ular.  Consider  a  transformation  in  the  (t,i+l) 
plane,  and  denote  by  the  2x2  matrix  formed  by  intersecting  rows 

i,  i+l  and  columns  i,  i+i  of  a  p  xp  matrix  Af.  Being  triangular,  the  two 
matrices  G  and  C  satisfy  these  special  relations: 

(G!-1)M+1=(GM+1)-1  , 

The  nonsingularity  of  follows  trivially  from  the  nonsingularity  and  the 

triangularity  of  G.  We  have  thus  proved  that 

=F*’*+HG***-"^)'^  ,  (2.4) 

the  key  condition  for  an  implicit  computation  of  an  SVD  of  C.  So,  let 
and  denote  rotations  for  a  2  x  2  SVD  : 
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(U^,i+l)Tci,i+lyi,i+l  =5  ^ 


where  S  is  diagonal.  We  have 

=s.  (yM+l)rGM+l  , 

i.e.,  the  two  rows  of  and  (V».‘+i)rGf‘.‘+i  are  parallel.  Hence  we 

can  find  one  single  rotation,  say  Q***'*’^,  that  will  triangularize  both  2x2 
matrices  andG*’*"^^  [5],  [9]. 

How  do  these  transformations  affect  the  p  x  p  upper  triangular  matrices 
F  and  G?  We  have 

^  ^  UT,i+l  ^  Qi,i+1  . 

G  <-  Vf.i+i  G  Qi^i+i  , 

where  L/^j+i,  and  Qi^i+i  denote  appropriate  pxp  rotations  in  the 

(i,  i+l)-plane.  Note  that  both  pxp  matrices  Uf^i+iF  and  Vj^i+iG  have  only 
one  non-zero  subdiagonal  element  each,  in  the  (i+1,  i)-position.  These  two 
extraneous  elements  are  annihilated  by  the  same  rotation  Qi,,  +i . 


2.2.  Algorithm  PSVD.  We  extend  the  GSVD  algorithm  to  the  general 
case  where  E  *  Ip.  First,  consider  the  product  (2.1).  Define 

C=EFG-^  and  H=EF  ,  (2.5) 

even  though  we  never  intend  to  explicitly  form  either  product.  Once  again, 
focus  on  a  2  X  2  problem  that  lies  on  the  diagonal: 

=£;M+l2rM+l(GM+l)-l 

=  /^M+1(g*‘.*+1)-i  , 

We  find  two  rotations,  say  and  to  diagonalize  the  matrix 

The  rotations  are  applied  to  and  : 

Qi,i+l  ^  (yi,i+l'^TQi,i+l 

From  previous  discussions  we  learn  that  we  can  find  one  rotation  to 

restore  both  matrices  to  triangular  forms: 


886 


Qi,i+1  ^  (yi.i+l)TQi,i+lQi,i+l 

Naturally,  we  want  to  rotate  JE***'*’^  and  F*’*'*'^  individually,  and  not  their  pro¬ 
duct  : 

FM+i  ^([7M+i)r£M+i  ^ 


The  fact  that  stays  upper  triangular  means  that  another  single  rota¬ 
tion  can  be  applied  to  maintain  the  triangularity  of  both  and 

j^i,i+i  ^  ^  (2.6) 


and 


We  summarize  our  algorithm  as  follows. 

Algorithm  PSVD. 

do  imtil  convergence 

alternate  between  i  e  Odd-set  and  Even-set 
begin 

{  and  Vi,i+i  are  “outer  rotations”  } 
determine  2ind  Vi  i+i  to 
annihilate  andci+i^i  ; 

E<-UT,uiE;  Gi-VT,uiG-, 


find  Qi^i+i  to  zero  out  hi+i^i  andgj+i^i  ; 
F  <— FQi,i+i;  G  ^  GQi^i+i; 
find  Wi  i+i  to  zero  out  and  fi+ii  ; 
Fi-WT,uiF-. 


end.  □ 
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By  convergence  we  mean  that  the  matrix  C  has  converged  to  a  diagonal  form 
D  =  diag  ( Yi ),  with 

Yi  ~  /ii  f  §ii 

The  matrices  of  left  and  right  singular  vectors  are  given  by  U  and  V,  respec¬ 
tively. 


2.3.  Algorithm  HK-SVD.  As  described  in  (1.3),  for  computing  the 
HK-SVD,  we  need  to  find  an  SVD  of  the  matrix  product 

C=RHARt  .  (2.7) 

To  make  use  of  the  implicit  algorithms  of  Section  2.2,  we  must  reduce  C  to  a 
product  of  upper  triangular  matrices.  To  accomplish  this,  compute  the  QR 
decomposition  (QRD)  of  A: 

A  =  QaRa  . 


where 


Ra  = 


Ra 

^{n-p)xp 


and  Ra  denotes  a  p  x  p  upper  triangular  matrix.  We  get 

RhAR}}  =(RhCIa)RaR-^  ■ 

Another  QRD  is  performed,  this  time  on  the  matrix  product  RhQa  ' 

RhQa  =  QrRh  •  (2.8) 

Thus,  the  problem  has  been  reduced  to  that  of  finding  an  SVD  of  the  product 

C=RhRaR1^  r  (2.9) 

where  Rh  is  n  x  n,  Ra  is  n  x  p  and  Rg  is  p  x  p.  So,  we  need  to  handle  the  dif¬ 
ferent  dimensions.  For  n  >p,  the  last  n  -  p  rows  and  columns  of  Rh  can  be 
discarded  because  the  last  n  -  p  rows  of  Ra  are  zero.  Hence,  set 

Et^ilRilip  ,  F^iIRa  ,  G<-Rk  ,  (2.10) 

where 
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(I  'I 

0(n-p)xp 


9 


and  compute  the  diagonalization  of  the  product  EFG  ^  using  Algorithm 
PSVD,  Finally,  set  the  n  x  n  matrix  of  left  singular  vectors  to  be 


Qk 


^  9px(n-p)'] 

®(n  -p )  X  p  ^n~p 


to  account  for  the  QRD  of  (2.8).  We  thus  obtain  our  new  algorithm: 

Algorithm  HK-SVD. 

compute  Cholesky  factorizations: 

H  =  RHiRh  ;  K  =  R^Rk', 
compute  QR  decomposition  of  A: 

A  =  QaRa  "» 

transform  the  matrix  Rfj: 

compute  QR  decomposition  of  Rui 
Rh  =  Qh^h\ 


set 

E^ilRillp  ,  F^'i^Ra  ,  G^Rk  ■, 

use  Algorithm  PSVD  to  find  an  SVD  of  EFG'^ .  □ 


3.  Final  Remarks 

This  paper  presents  an  implicit  algorithm  for  computing  the  SVD  of  a 
product  of  three  matrices.  The  algorithm  plays  an  integral  role  in  the  new 
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method  in  Section  2.3  for  computing  the  HK-SVD.  The  applicability  of  the 
algorithm  was  exemplified  in  the  solution  to  a  weighted  least  squares  prob¬ 
lem,  which,  for  instance,  arises  in  a  specific  aircraft  problem. 

All  problems  in  the  paper  call  for  the  diagonalization  of  a  product  of 
three,  not  necessarily  symmetric,  matrices.  The  extension  of  our  methods  to  a 
product  of  more  matrices  is  straightforward.  Although  we  assume  that  the 
inverses  in  (2.1)  exist,  our  algorithms  can  easily  be  adapted  for  rank 
deficiency  by  using  matrix  adjoints  (cf.  Paige  [9]). 

Our  new  algorithm  was  simulated  on  a  VAX  11/750  using  MATLAB.  It 
is  well  for  a  massively  parallel  computer;  Ewerbring  and  Luk  [1],  [2], 
presented  implementations  of  the  SVD  and  GSVD  methods  described  in  this 
paper  on  the  65,536  processor  Connection  Machine. 
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Q 

ABSTRACT .  We  present  an  algorithm,  called  A  ,  for  finding  the  least- 
cost  path  from  start  node  to  goal  node  set  in  an  OR-graph,  where  arc  costs 
are  scalar-valued  and  the  cost  of  each  path  is  the  sum  of  the  concomitant 
arc  costs.  Search  is  guided  by  a  set,  H,  of  real-valued  functions  on  the 

Q 

node  set.  If  H  -  (h:i<h)  for  given  function  i,  then  A  essentially  becomes 
* 

A  .  If  H  is  bounded,  then  successors  of  the  newly  expanded  node  may  not  be 
placed  on  OPEN.  We  address  the  issue  of  admissibility.  A  new  concept,  the 

completeness  of  a  heuristic  set  with  respect  to  a  path  in  the  graph,  is 

introduced. 

* 

INTRODUCTION .  In  this  paper,  we  present  a  generalization  of  A  ,  which 

G  G  * 

we  call  A  .  The  key  characteristic  that  distinguishes  A  from  A  is  that 

Q 

knowledge  used  to  guide  A  is  represented  by  a  set  of  heuristic  functions, 
or  a  heuristic  set .  rather  than  by  a  single  heuristic  function  (or  more 
precisely,  a  specially  structured  heuristic  set  induced  by  a  single 
heuristic  function) .  A  key  result  of  this  characteristic  is  that  it  may  not 
be  necessary  to  place  on  OPEN  all  the  successor  nodes  of  a  node  chosen  for 

expansion.  A  possible  implication  of  this  result  is  that  the  OPEN  set  will 

tend  to  be  smaller  and  hence  easier  to  store  and  to  sort. 

There  are  at  least  three  reasons  for  allowing  knowledge  to  be 
represented  by  a  set  of  heuristic  functions  in  order  to  guide  search. 
First,  more  information  about  the  perfect  heuristic  may  be  available  than 
just  a  lower  bound,  and  this  information  may  be  such  that  it  can  be 
represented  by  set  inclusion.  Second,  it  seems  reasonable  that  more  (or 
better)  information  for  search  guidance  would  not  degrade  the  quality  of  the 
search  procedure,  although  this  may  not  always  be  true  in  general  (White  and 
Harrington,  19S0) .  Third,  upper  and  lower  bound  information  has  proven  very 
useful  in  action  elimination  algorithms  for  Markov  decision  processes  (e.g., 
Puterman  and  Shin,  1982),  a  problem  formulation  of  particular  interest  to 
us . 


The  outline  of  this  paper  and  its  results  follow  the  basic  outline  of 
Section  3.1  (Pearl,  198A) .  We  begin  by  defining  the  problem  of  interest  and 
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Q 

setting  terminology.  The  A  algorithm  is  presented  in  Section  2. 
Termination  and  the  completeness  of  a  heuristic  set,  a  new  concept,  are  the 
topics  of  Section  3.  Section  4  is  concerned  with  admissibility. 

Q 

Future  research  will  involve  comparing  A  with  different  admissible 

Q 

heuristic  sets  and  investigating  the  computational  significance  of  A  . 


1.  PROBLEM  DEFINITION  and  TERMINOLOGY 


Let  N  represent  the  countable  set  of  nodes  in  the  OR  graph.  The  set  A 
C  N  X  N  is  the  set  of  directed  arcs .  Node  s  e  N  represents  the  start  node ; 
the  finite  set  F  C  N,  having  generic  element  7,  represents  the  goal  node 
set.  We  let  G  -  (N,A,s,r)  designate  the  graph  under  consideration. 

N 

Let  SCS:  N  -♦  2  be  the  successor  set  function,  where  SCS(n)  represents 
the  set  of  all  nodes  n'  e  N  such  that  (n,n')  e  A.  We  assume  throughout  that 
SCS(n)  is  finite  for  all  n  e  N. 

A  path  P  -  (n^ , . . . , 

for  all  k  -  1 . K-1.  Let  P(n,S)  be  the  set  of  all  finite  length,  acyclic 

paths  from  n  €  N  to  S  C  N.  Notationally ,  if  S  is  a  singleton,  i.e.,  if  S  - 
{n' ) ,  then  we  will  write  P(n,S)  -  P(n,n'). 

The  function  c:  A  -♦  R  is  the  arc  cost  function;  the  cost  assigned  to  a 
path  is  assumed  to  be  the  sum  of  the  concomitant  arc  costs.  Throughout,  we 
assume  that  there  is  a  constant  5  >  0  such  that  $  s  c(o)  for  all  a  e  A. 
Notationally,  we  will  often  replace  c(a)  with  c(n,n'),  where  a  -  (n,n'). 

The  problem  objective  is  to  find  a  minimum  cost  path  in  P(s,r).  Let 

★ 

P  (n,S)  Q  P(n,S)  represent  the  set  of  all  optimal,  i.e.,  minimal  cost,  paths 

★ 

from  n  6  N  to  S  S  N.  Thus,  we  seek  a  path  in  P  (s,r). 

Heuristic  information  will  prove  useful  in  meeting  our  objective.  We 
assume  that  this  information  is  represented  in  set  form.  Specifically,  let 
H  be  the  set  of  all  real-valued  functions  on  N.  We  call  a  given  subset  H  C 

* 

H  the  heuristic  set .  We  will  assume  that  search  for  a  path  in  P  (s,r)  is 
guided  by  a  given  heuristic  set.  This  is  in  contrast  to  the  heuristic 

* 

search  procedure  A  ,  which  assumes  that  search  is  guided  by  a  given 
heuristic  function,  i.e.,  an  element,  rather  than  a  subset,  of  H. 

Several  functions  in  H  will  prove  to  be  important  in  developments  to 
follow.  Let  g  be  the  current  path  cost  function,  where  g(n)  represents  the 

* 

cost  of  the  current  path  from  s  to  n  and  where  g(s)  -  0.  The  function  g  is 

'/c 

such  that  g  (n)  represents  the  minimal  cost  of  paths  going  from  s  to  n.  For 
given  heuristic  set  H,  let  i,  the  lower  bound  function  of  H,  be  defined  as 


n^)  is  a  sequence  of  nodes  such  that  £  SCS(nj^) 
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i(n)  -  inf {h(n) :heH)  for  all  n  €  N.  Define  f  as  f(n)  -  g(n)  +  i(n)  for  all 

n  e  N.  Also,  let  h  represent  the  perfect  heuristic  function,  which  must 
satisfy  the  following  dynamic  program: 

h*(n)  -  min  {c(n,n')  +  h  (n' ) ;n'eSCS(n) } 

h*(7)  -  0.  7  €  r 
* 

h  (n)  -  <D  if  SCS(n)  is  empty. 

"At  ^ 

We  let  C  represent  the  minimal  cost  of  paths  going  from  s  to  F.  Thus,  C  - 
h  (s) .  Note  that  our  objective  is  to  determine  a  path  in  P(s,r)  having  cost 


Let  U: 
define  as: 


A  X  2 


H 


R  be  called  the  node  expansion  function,  which  we 


U(n,n' ,H)  -  sup  {h(n)  -  h(n'):heH). 


2 the  A-  ALGORITHM 

Q 

We  now  state  the  A  algorithm: 

0.j  Initialization.  Set  OPEN  equal  to  the  set  containing  only  the  start 
1  node  and  set  CLOSED  to  the  empty  set. 

1.'  If  OPEN  is  empty,  then  terminate  with  failure. 

2. i  Remove  from  OPEN  and  place  on  CLOSED  a  node  n  for  which  f(n)  -  g(n)  + 
I  i(n)  is  minimum  with  respect  to  all  nodes  in  OPEN. 

3.  I  If  n  is  a  goal  node,  then  trace  through  backpointers  from  n  to  s  to 
\  determine  the  solution  path  and  terminate  successfully. 

4.  \lf  n  is  not  a  goal  node,  generate  its  successors.  If  n  has  no 
\successors,  then  go  to  Step  5.  Otherwise,  for  all  successors  n'  of  n, 

aompute  U(n,n',H). 

a;  If  n'  €  OPEN  u  CLOSED  and  U(n,n',H)  >  c(n,n'),  then  add  n'  to  OPEN 
and  add  a  backpointer  from  n'  to  n. 

b.  If  n'  €  OPEN  u  CLOSED  and  U(n,n' ,H)  <  c(n,n'),  then  go  to  Step  5. 

c.  If  n'  e  OPEN  u  CLOSED  and  U(n,n' ,H)  >  c(n,n'),  then  direct  its 
pointers  along  the  path  yielding  the  lowest  g(n')  and  put  n'  on 
OPEN  if  pointer  adjustment  was  required. 
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d.  If  n'  e  OPEN  U  CLOSED  and  U(n,n' ,H)  <  c(n,n'),  then  go  to  Step  5. 
5.  Increment  the  iteration  counter  and  go  to  Step  1. 

G 

Step  4b  represents  the  major  new  feature  of  A  ,  relative  to  A  . 

* 

Justification  for  this  step  is  as  follows.  Assume  h  €  H,  a  condition  on  H 
that  we  will  later  refer  to  as  admissibility.  Then,  U(n,n' ,H)  <  c(n,n') 

St  St 

implies  that  h  (n)  -  h  (n' )  <  c(n,n'),  or  equivalently 

h  (n)  <  c(n,n')  +  h  (n' ) . 


It  then  follows  from  the  dynamic  programming  equation  describing  h  that  n' 
is  not  the  minimizing  element  in  SCS(n)  and  hence  n'  is  not  on  an  optimal 

St 

path  from  n  to  F.  Thus,  in  searching  for  a  path  in  P  (s,r),  it  will  never 
be  useful  to  consider  a  path  in  P(s,r)  containing  arc  (n,n'). 

St 

The  heuristic  function  providing  guidance  to  A  is  said  to  be 
admissible  if  it  represents  a  lower  bound  on  the  perfect  heuristic.  It  is 
therefore  natural  to  think  of  heuristic  functions  and  lower  bound  functions 
as  being  analogous.  Let  H  -  {he//:i<h)  be  the  heuristic  set  induced  by  the 

Q 

lower  bound  function  i.  Then  U(n,n' ,H)  >  c(n,n')  for  all  (n,n')  6  A,  and  A 

* 

essentially  becomes  A  .  Thus,  we  consider  the  concept  of  a  heuristic  set  to 

Q 

be  a  generalization  of  the  concept  of  a  heuristic  function  and  hence  that  A 

St 

is  a  generalization  of  A  . 

3.  TERMINATION  and  COMPLETENESS 


Assumptions  on  SCS  and  c  insure  the  following  result.  Proof 

* 

straightforward  adaptation  of  the  concomitant  result  for  A  (see  pp. 
in  Pearl ,  1984) . 


is  a 
76-77 


THEOREM  1.  A  terminates  after  a  finite  number  of  iterations. 


We  now  present  a  sufficient  condition  for  A  to  be  complete ,  i.e.,  to 

terminate  with  a  path  in  P(s,r),  assuming  P(s,r)  is  not  empty. 

DEFINITION.  The  heuristic  set  H  is  complete  with  respect  to  the  path 
(n^ . nj^)  if  U(n^,nj^^^,H)  >  c(nj^,n^^j^)  for  all  k  -  1,...,K-1. 
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Let  H  be  the  heuristic  set  induced  by  the  bounded  lower  bound  function 
Then  U(n,n' ,H)  >  c(n,n')  for  all  (n,n')  e  A,  and  hence  the  heuristic  set 
induced  by  any  bounded  lower  bound  function  is  trivially  complete  with 
respect  to  any  path  in  the  graph.  We  remark  that  this  fact  eliminates  the 

* 

need  to  define  complete  heuristic  functions  for  the  A  algorithm. 


THEOREM  2.  For  infinite  graph  G,  assume  that  the  heuristic  set  H  is 

Q 

complete  with  respect  to  a  path  in  P(s,r).  Then  A  is  complete  on  G. 


Proof:  The  completeness  of  H  insures  that  at  least  one  node  from  at  least 

one  solution  path  is  always  OPEN  prior  to  termination.  The  result  then 

■Jr 

follows  as  for  A  ;  e.g,,  see  the  proof  of  Theorem  1,  p.  77,  in  Pearl,  1984. 


□ 


4.  ADMISSIBILITY 

Q 

We  now  present  a  condition  which  will  insure  that  A  is  admissible , 

i.e.,  A  will  terminate  with  a  path  in  P  (s.F),  assuming  P  (s.D  is 
nonempty. 

DEFINITION:  The  heuristic  set  H  is  admissible  if  h  €  H. 

* 

Note  that  if  H  is  admissible,  then  i  <  h  ,  where  £  is  the  lower  bound 
function  induced  by  H.  An  important  relationship  between  heuristic  set 
admissibility  and  completeness  is  now  presented. 


LEMMA  1.  Assume  that  the  heuristic  set  H  is  admissible.  Then  H  is  complete 

★ 

with  respect  to  every  path  in  P  (s.F). 

Sir 

Proof:  Let  (n^ . n^)  e  P  (s,F).  Then,  there  exists  an  h  €  H,  namely  h  , 

such  that: 


h*(n^)  -  ^(n^.  Vi)  * 
for  all  k-  1,...,K  -  1,  and  hence 
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for  all  k-  1,...,K  -  1.  Therefore,  for  all  k  -  1 . K 


1,  s 


Let  P*^(n,S,H)  be  the  set  of  all  paths  in  P(n,S)  for  which  H  is 
complete.  The  following  result  is  then  a  corollairy  to  Lemma  1. 


LEMMA  2.  Assxime  H  is  admissible.  Then,  P*(s,r)  C  P‘^(s,r,H)  £  P(s,r). 


We  observe  that  if  H  -  {h€H:i<h)  for  some  bounded  heuristic  i,  then 
P^(s,r,H)  -  P(s,r),  and  if  H  contains  only  the  perfect  heuristic,  then 

P^'Cs.r.H)  -  P*(s,r). 

•k 

LEMMA  3.  Let  H  be  complete  with  respect  to  path  P  €  P  (s,n**),  where  n"  is 
not  necessarily  a  node  in  F. 

(a)  If  there  exists  a  shallowest  node,  n' ,  on  P  in  OPEN,  then  g(n' )  - 

*  a 

g  (n').  Furthermore,  all  ancestors,  n  ,  of  n'  on  P  are  on  CLOSED  and 

a  ^  a 

are  such  that  g(n  )  -  g  (n  ) . 

(b)  If  there  does  not  exist  a  shallowest  node  on  P  in  OPEN,  then  all  nodes, 

* 

n,  on  P  are  in  CLOSED  and  are  such  that  g(n)  -  g  (n) . 

Q 

Lemma  3  indicates  that  A  has  already  found  the  optimal  pointer-path  to 

n'  (along  the  path  in  P  (s,n"))  and  that  this  pointer -path  will  remain 
unaltered  throughout  the  search. 


Proof:  By  induction.  We  will  show  that  for  all  iterations,  either 

i.  there  exists  a  shallowest  node,  n' ,  on  P  in  OPEN,  g(n’)  -  g  (n'),  and 

S  3i 

all  ancestors  of  n'  on  P,  n  ,  are  on  CLOSED  and  are  such  that  g(n  )  - 
g  (n  ) ,  or 

ii.  there  does  not  exist  a  shallowest  node  on  P  in  OPEN. 

We  begin  by  proving  an  intermediate  result:  Assume  P  o  OPEN  -  then 

* 

P  Q  CLOSED  and  g(n)  -  g  (n)  for  all  n  e  P.  Note  that  P  n  OPEN  -  4>  cannot 

Q 

hold  initially,  since  A  places  s  on  OPEN  at  the  beginning  of  iteration  1. 
Observe,  however,  that  at  the  beginning  of  iteration  2,  s  e  CLOSED  and  g(s) 
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-  g  (s)  -  0.  More  generally,  assume  n  e  P  n  CLOSED,  n  is  an  ancestor  of  n" , 

and  Chat  g(n)  -  g  (n) .  Since  n  is  on  CLOSED,  n  has  been  expanded.  Since  H 
is  complete  with  respect  to  P,  n'  e  P  O  SCS(n)  is  placed  on  OPEN  U  CLOSED. 
But  since  P  n  OPEN  -  n'  e  CLOSED.  The  optimality  of  P  implies  g(n')  - 

if  it 

g(n)  +  c(n,n')  -  g  (n)  +  c(n,n')  -  g  (n' ) .  The  intermediate  result  then 
follows  by  induction. 

Consider  iteration  1.  Node  s  is  the  shallowest  node  on  OPEN,  g(s)  - 

g  (s)  “0,  and  s  has  no  ancestors.  So  the  result  holds  at  iteration  1. 

Assume  the  result  holds  at  iteration  k.  If  there  does  not  exist  a 
shallowest  node  on  P  in  OPEN,  then  by  the  above  intermediate  result,  all 
nodes  on  P  are  on  CLOSED  and  are  not  candidates  for  pointer  path 
readjustment.  Therefore,  all  nodes  on  P  will  remain  on  CLOSED,  and  hence 
there  will  continue  to  be  no  shallowest  node  on  P  n  OPEN.  Thus  the  result 
holds  for  iteration  k  +  1. 

Assume  there  does  exist  a  shallowest  node  n  e  P  in  OPEN  such  that  g(n) 

—  g  (n) .  Furthermore,  assume  all  ancestors  n^  of  n  on  P  are  such  that  n^  G 

CLOSED  and  g(n^)  -  g  (n^) .  If  n  is  not  expanded,  n  remains  the  shallowest 
node  on  P  o  OPEN,  since  all  ancestors  of  n  are  not  candidates  for  pointer 
path  readjustment.  Hence,  the  result  holds  for  iteration  k  +  1. 

Assume  n  is  expanded  and  n  is  an  ancestor  of  n" .  (If  n  -  n" ,  then  the 
result  holds  trivially.)  Since  H  is  complete  with  respect  to  P,  n'  G  P  n 
SCS(n)  will  be  placed  on  OPEN  U  CLOSED.  Prior  to  the  expansion  of  n,  three 
cases  are  possible:  (i)  n'  €  OPEN  U  CLOSED,  (ii)  n'G  OPEN,  and  (iii)  n'  G 

CLOSED. 

Assume  n'  G  OPEN  u  CLOSED.  Then  n'  will  be  placed  on  OPEN,  becoming 

★ 

the  new  shallowest  node  on  OPEN,  and  g(n' )  -  g(n)  +  c(n,n')  -  g  (n').  Hence 
the  result  holds  for  iteration  k  +  1. 

Assume  n'  G  OPEN.  Then  n'  will  remain  on  OPEN,  becoming  the  new 

shallowest  node  on  OPEN,  and  pointer  path  readjustment  may  have  to  take 

* 

place  in  order  to  insure  that  g(n')  -  g  (n' ) .  Hence  the  result  holds  for 
iteration  k  +  1. 

Assume  n'  G  CLOSED.  If  pointer  path  readjustment  is  required,  n'  is 

placed  on  OPEN,  becoming  the  new  shallowest  node  on  OPEN,  and  g(n' )  - 

if 

g  (n').  Hence  the  result  holds  for  iteration  k  +  1. 

If  n'  G  CLOSED  and  pointer  path  readjustment  is  not  required,  then  n' 

k 

remains  on  CLOSED  and  g(n' )  -  g  (n').  Since  n’  was  on  CLOSED,  it  has  been 
expanded.  Use  of  induction,  the  completeness  of  H  with  respect  to  P,  the 
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finite  length  of  P,  and  the  optimality  of  P  guarantees  either  (a)  or  (;9)  , 
where ; 

(a)  There  is  a  descendant  of  n'  on  P  n  OPEN,  n^,  such  that  g(n*^)  -  g*(n^) 

and  each  ancestor  of  n*^  on  P  that  is  a  descendant  of  n'  ,  n^,  is  such 

+  *  + 

that  g(n  )  -  g  (n  ) .  Hence  the  result  holds  for  iteration  k  +  1. 

(/3)  Each  descendant  of  n'  on  P,  n^,  is  on  CLOSED  and  is  such  that  g(n^)  - 
g  (n  ) .  Hence  P  n  OPEN  -  0,  and  the  result  holds  for  iteration  k  +  1. 

□ 

The  following  example  indicates  that  if  H  is  not  complete  with  respect 

to  a  path  in  P  (s,n"),  n"  not  necessarily  in  T,  then  Lemma  3  may  not  hold, 

even  if  H  is  admissible. 

EXAMPLE  1:  Let  N  -  { s ,n^ , . . . ,n^ ,7)  and  P  -  {7).  The  sets  SCS(*),  the  cost 
structure  c(*,*),  and  the  resulting  function  h  (•),  are  given  in  Table  1. 

k  ic 

Let  H  -  {h,h  },  where  h(n)  -  h  (n)  for  all  n  e  N  except  n^.  Let  h(n^)  -  0. 
We  note  that  H  is  admissible.  Let  OPEN^^  and  CLOSEDj^  be  the  OPEN  and  CLOSED 

Q 

sets  at  the  beginning  of  the  kth  iteration  of  A  .  Then; 


OPEN^  -  {s) 

CLOSED^  -  4 

OPEN2  -  (n^) 

CLOSED2  -  (s) 

OPEN^  -  tn^) 

CLOSED^  -  {s.n^} 

OPEN^  - 

CLOSED^  -  { s , n^ , n^ 

Node  n2  was  not  placed  on  OPEN  during  iteration  1  because  U(s,n2,H)  < 

c(s,n2).  We  note  that 

g(n^)  -  c(s,n^)  +  c(n^,n2)  +  c(n2,n^)  -  3, 

whereas 

k 

g  (n^)  -  c(s,n^)  +  c(n^,n^)  -2.  □ 
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TABLE  1 :  Data  for  Example  1 . 
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LEMMA  4. 


Assume  H  is  admissible  and  Chat  path  P  €  P  (s.T).  Then  at  any 

Q 

time  before  A  terminates,  there  exists  an  OPEN  node  n'  on  P  such  that  g(n' ) 
-  g*(n')  and  f(n')  <  C  . 


Proof:  The  admissibility  of  H  and  Lemma  1  imply  that  H  is  complete  with 
respect  to  path  P.  Assume  there  does  not  exist  a  node  n'  e  P  n  OPEN.  Then 
by  Lemma  3b,  all  nodes  on  P  are  on  CLOSED,  including  a  goal  node.  But  a 

Q 

goal  node  on  CLOSED  implies  that  A  has  terminated,  which  is  a 

contradiction.  Therefore,  there  exists  a  shallowest  node,  n' ,  on  P  in  OPEN. 

“ic  "it 

By  Lemma  3a,  g(n')  -  g  (n' ) .  Since  P  e  P  (s,r)  and  since  H  is  admissible, 
f(n')  -  g(n')  +  i(n')  -  g*(n')  +  i(n')  <  g*(n')  +  h*(n')  <  C*.  □ 

Q 

THEOREM  3.  Assume  the  heuristic  set  H  is  admissible.  Then,  A  is 

admissible . 


Proof:  Assume  there  exists  an  optimal  path  P  €  P  (s,r)  with  cost  C  .  Since 
H  is  admissible,  Chen  by  Lemma  1  H  is  complete  with  respect  to  P.  By  Lemma 

Q 

4,  at  any  time  before  A  terminates  there  exists  a  node  n'  6  OPEN  n  P  such 

that  g(n' )  -  g  (n')  and  f(n')  S  C  .  Therefore,  A  cannot  terminate  until  it 
has  expanded  a  goal  node  7. 

Q 

At  the  time  A  selects  7  for  expansion,  there  exists  a  node  n'  e  OPEN  n 

if  ic  Q 

P  such  that  g(n')  -  g  (n')  and  f(n')  <  C  .  Thus,  for  A  to  choose  7  for 

ic  ic 

expansion,  f(7)  <  f(n')  <  C  .  Hence,  f(7)  -  C  ,  and  so  A  has  found  an 

optimal  path.  □ 


REFERENCES 

White,  C.C.,  and  Harrington,  D.P.,  "Application  of  Jensen's  Inequality  for 
Adaptive  Suboptimal  Design,"  Journal  of  Optimization  Theory  and 
Applications ,  Vol.  32,  pp.  89-99,  1980. 

Puterman,  M.L. ,  and  Shin,  M.C.,  "Action  Elimination  Procedure  for  Modified 
Policy  Iteration  Algorithms,"  Operations  Research.  Vol.  30,  pp.  301-318, 
1982. 

Pearl,  J. ,  Heuristics :  Intelligent  Strategies  for  Computer  Problem  Solving. 
Addison-Wesley ,  Reading,  MA,  1984. 


902 
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Abstract.  This  paper  presents  a  new  parallel  version  of  the  Householder  algorithm  with 
column  pivoting  for  computing  the  QR  factorization  of  a  matrix.  In  contrast  to  the  standeird 
algorithm  we  employ  a  local  pivoting  scheme  that  allows  for  efficient  implementation  of 
the  algorithm  on  a  parallel  machine,  in  particular  one  with  a  distributed  architecture.  An 
inexpensive  but  reliable  incremental  condition  estimator  is  used  to  control  the  selection  of 
pivot  columns  by  obtaining  cheap  estimates  for  the  smallest  singular  value  of  the  cmrently 
created  upper  triangular  matrix  R.  Numerical  experiments  show  that  the  local  pivoting 
strategy  behaves  about  eis  well  as  the  traditional  global  pivoting  strategy.  They  also  show 
the  advantages  of  incorporating  the  controlled  pivoting  strategy  into  the  traditional  QR 
adgorithm  to  guard  against  the  known  pathological  cases. 

1  Introduction 

One  of  the  standard  problems  in  numerical  linear  algebra  is  the  solution  of  the  linear 
least  squares  problem 

min  IjAx  -  5||2  (1) 

where  A  is  an  m  x  n  (m  >  n)  matrix.  The  common  way  to  approach  this  problem  [12,17,19] 
is  via  a  QR  factorization 

AP  =  QR  (2) 

of  A.  Here  P  is  an  n  x  n  permutation  matrix,  Q  is  an  m  x  n  matrix  with  orthogonal  columns 
(i.e.  Q^Q  =  In)  and  i2  is  an  upper  triangular  n  X  n  matrix.  If  A  is  a  dense  matrix,  Q  is 

'This  work  was  supported  by  the  U.S.  Array  Research  Office  through  the  Matheraatical  Science  Institute 
of  Cornell  University,  by  the  Office  of  Naval  Research  under  contract  N00014-83-K-0640  and  by  NSF  contract 
CCR  S6-03310. 
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usually  computed  by  a  sequence  of  Householder  Transformations 


t 

i 

i 


H  =  /  -2uu^ 


Choosing 


u  = 


X  +  sign(*i)|jzjljei 


I*  +  sign(xi)llx||jei||2 
we  can  reduce  a  given  vector  x  to  a  multiple  of  the  canonical  unit  vector  e\  since 

(7  -  2uu^)x  =  -8ign(xi)  ||x||2€i. 


(3) 


If  A  has  full  rank,  we  can  avoid  exchanging  columns  when  computing  the  QR  factorization, 
i.e.  P  in  (2)  will  be  the  identity.  If  the  rank  of  A  is  not  known,  we  can  employ  column 
pivoting  [3].  The  idea  is  to  choose  as  next  column  always  the  one  that  has  the  highest 
residual  with  respect  to  the  subspace  spanned  by  the  columns  that  were  selected  before.  The 
hope  is  that  in  the  resulting  QR  factorization  (2)  of  A  the  iU-conditioning  of  A  will  reveal 
itself  by  a  small  trailing  subblock  of  7i:  if  <ri  >  <r2  >  . . .  >  Cn  are  the  singular  values  of  A 
and  we  partition  R  into 


(4) 


with  an  r  X  r  lower  right  hand  block  ^22  then  it  is  easy  to  show  [12,  page  19]  that 


O’n-r+l  (^)  <  [[.fizzIU- 


While  there  are  counterexamples  (see  section  5)  where  the  column  pivoting  strategy  fails  to 
reveal  ill- conditioning  of  A,  it  works  well  in  practice. 

Another  alternative  is  to  compute  the  Singular  Value  Decomposition  (SVD) 

A  =  USV^  (5) 


of  A.  Here  U  and  V  are  orthogonal  matrices  whose  columns  zue  the  left-  and  right-singular 
vectors  of  A,  respectively.  S  =  diag(<ri)  contziins  the  singular  values  of  A.  The  SVD  is  at 
least  twice  as  expensive  to  compute  as  the  QR  factorization  and  for  that  reason  the  QR 
factorization  with  colunm  pivoting  is  usually  preferred. 

There  is  also  a  middle  path  between  QR  factorization  and  SVD.  As  was  pointed  out 
originally  in  [11]  we  can  use  the  singular  vector  corresponding  to  the  smallest  singular  value 
to  find  a  permutation  P  that  will  guarantee  a  small  rnn  if  is  small.  Chan  [4]  and 

Foster  [9]  extend  this  idea  to  higher  dimensions.  Their  idea  is  to  first  compute  any  QR 
factorization  of  A  and  then  ‘‘peel  ofP  the  small  singular  values  of  R  one  after  the  other  by 
computing  an  appropriate  singular  vectors  at  each  step.  Let  us  from  now  on  assume  that  A 
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has  r  small  singular  values  and  that  there  is  a  well-defined  gap  between  (Tn-r  and 
It  is  shown  in  [11]  that  a  well-defined  gap  is  necessary  to  make  a  sensible  decision  on  the 
numerical  rank  of  A.  Then  Chan  proves  that  if  r  is  not  too  large  his  algorithm  will  compute 
a  “rank-revealing  QR  factorization”  in  the  sense  that  R22  in  (4)  is  guaranteed  to  be  small. 

On  a  single  processor  the  Householder  QR  factorization  without  pivoting  requires  O(mn^) 
flops,  column  pivoting  requires  an  additional  n*  flops  and  the  rank -revealing  QR  algorithm 
requires  an  additional  3rn^  flops  on  average  [4].  So  the  computationiil  complexity  of  these 
algorithms  is  comparable  on  a  single-processor  machine. 

The  situation  is  quite  different  on  a  multiprocessor  machine  especially  if  it  is  based  on 
a  distributed  architecture.  The  Householder  QR  algorithm  without  pivoting  can  be  very 
efficiently  parallelized  simply  by  pipelining  the  computation  [1,20].  So  one  processor  can  still 
be  busy  finishing  a  previous  update  while  another  already  computes  the  next  Householder 
vector.  The  introduction  of  column  pivoting  makes  pipelining  impossible  since  all  processors 
have  to  synchronize  to  select  the  next  pivot  column.  Hence  the  Householder  QR  algorithm 
essentially  proceeds  in  a  lockstep  fashion  which  results  in  a  serious  loss  of  efficiency  on 
machines  that  previously  could  profit  from  the  pipelining. 

In  Chan’s  algorithm  the  steps  after  the  initial  QR  factorization  are  hard  to  paraJlelize. 
For  each  of  the  r  small  singular  values  of  A,  the  algorithm  computes  an  approximate  singular 
vector  via  inverse  iteration.  On  average  this  requires  two  iteration  steps  [4]  and  hence  the 
solution  of  four  triangular  equation  systems  per  small  singular  value.  Although  much  progress 
has  been  made  recently  in  solving  triangular  equation  systems  on  distributed  architectures  [7, 
14,  13,  18]  this  problem  can  by  no  means  be  parallelized  as  efficiently  as  the  inital  QR 
factorization.  In  addition  the  permutation  deduced  from  the  singular  vector  destroys  the 
upper  triangular  shape  of  R  which  then  has  to  be  restored  by  a  sequence  of  Givens  rotations. 
Again  that  is  essentially  a  sequential  process  that  is  hard  to  parallelize  [7].  Apart  from  their 
sequential  nature,  €ui  inherent  difficulty  in  parallelizing  the  equation  solving  and  QR  update 
steps  is  that  the  computational  work  is  of  the  same  order  of  magnitude  as  the  amount  of 
data  it  involves.  That  is  we  have  to  perform  0{n^)  flops  using  0(n*)  data.  Since  R  is 
distributed  throughout  the  system  it  is  hard  to  mask  the  communication  overhead  with  the 
little  euithmetic  work  to  be  performed.  So  the  post-processing  of  R  can  end  up  being  a  good 
part  of  the  overedl  computation  time  on  a  parallel  machine. 

In  this  paper  we  suggest  a  new  QR  decomposition  algorithm  that  avoids  these  penalties 
eind  can  be  efficiently  parallelized.  By  using  a  local  pivoting  strategy  we  are  able  to  pipeline 
the  computation  and  at  the  same  time  identify  the  set  of  columns  of  A  responsible  for  its  ill- 
conditioning.  In  section  2  we  outline  the  pipelined  Householder  QR  algorithm  2md  motivate 
the  local  pivoting  strategy.  Section  3  introduces  a  condition  estimator  that  allows  us  to 
monitor  the  numerical  soundness  of  the  local  pivoting  strategy  and  presents  numerical  results 
showing  its  robustness.  Section  4  combines  these  ideas  into  an  effective  parallel  algorithm 
for  determining  the  numerical  rank  of  A  and  thus  solving  the  linear  least  squares  problem. 
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Numerical  results  obteiined  by  simulating  the  parallel  edgorithm  are  presented  in  section  5. 
We  summeirize  our  results  and  outline  possible  directions  of  future  research  in  section  6. 

2  The  Householder  QR  Algorithm  with  Local  Column  Pivoting 

The  Householder  QR  algorithm  without  pivoting  processes  the  columns  of  A  in  their 
natural  order  from  left  to  right.  If  we  have  a  parallel  machine,  it  is  natur£d  to  group  the 
processors  into  a  logic2d  ring  and  deal  out  columns  in  a  round-robin  fashion.  This  technique 
staggers  the  computation  across  the  processors  and  guarantees  a  load  balanced  computation. 
It  allows  simple  static  assignment  of  data  to  processors  and  is  for  the  most  part  synchronized 
by  the  flow  of  data  between  processors.  Due  to  these  attractive  characteristics,  the  pipelining 
technique  has  been  widely  used  [10,13,16].  If  special  vector  hardware  can  be  exploited,  several 
Householder  matrices  can  be  bundled  together  by  using  the  WY  factorization  [2,21]  to  arrive 
at  a  block  pipelined  edgorithm  [1]. 

In  contrast  the  QR  algorithms  with  traditional  colunm  pivoting  at  each  step  chooses  the 
column  that  has  the  highest  residual  with  respect  to  the  subspace  spanned  by  the  columns 
already  selected.  This  residual  is  easy  to  compute  and  can  be  updated  cheaply  as  new  colunms 
are  selected  [8].  But  in  the  parallel  setting,  the  selection  of  the  pivot  column  introduces  a 
sjmchronisation  point.  Each  processor  can  easily  choose  its  local  candidate  pivot  column  by 
considering  only  the  columns  that  are  assigned  to  it.  Choosing  the  global  pivot  column 
the  other  hand  requires  that  each  processor  either  m2dces  its  local  pivot  information  known  to 
all  other  processors  or  that  a  designated  processor  collects  all  the  local  pivot  informations.  So 
global  pivoting  essentially  forces  the  program  into  a  lockstep  mode  that  may  severly  curtail 
performance. 

The  easiest  way  out  of  this  dilemma  is  to  forgo  global  pivoting  altogether  and  content 
oneself  with  local  pivoting.  A  simplified  version  of  the  resulting  adgorithm  for  a  ring  of 
processors  is  given  in  Figure  1.  We  distribute  columns  of  A  to  processors  in  a  round-robin 
fashion.  To  be  precise,  let  us  assume  that  we  have  p  processors  proco, . . .  ,proCp_i  and  that 
aj  is  the  jth  column  of  A.  Then  processor  proci  receives  columns  Oj  where 

i  =  (j  -  l)modp. 

This  is  commonly  referred  to  as  the  column  wrap  mapping.  The  array  C  is  local  to  each 
processor  and  contains  the  colsk  columns  assigned  to  processor  co/s*  =  n)  .  pujt 

and  Pright  designate  the  left  and  right  neighbour  of  prock,  respectively. 

u  «—  genhh{x) 


returns  u  as  defined  by  (3)  eind 


A  *—  apphh{u,  i4) 
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processor  prock 

lent  0;  {counter  for  HH  vectors  generated  in  proci,} 
gent  4—  0;  (counter  for  HH  vectors  generated  globally) 
foreach  t  G  (1,  . . . ,  co/s^}  do 

permi  <-  (i  +  1)  +  (i  -  l)p;  {  wrap  mapping) 
resi  4-  ||c(:,i)||2; 
end  foreach 

if  {k  =  0)  then  (determine  first  pivot  column) 
lent  4—  gent  4—  1  ; 

determine  first  pivot  column,  send  it  it  Prigtu 
update  all  other  columns  as  shown  in  main  loop  below, 
end  if 

while  {lent  <  eolsk)  do  (main  loop) 

receive  u  from  pu/t  ;  gent  4—  gent  +  1; 
if  (u  not  generated  by  Pright)  then  send  u  to  Pright  end  if 
if  {k  =  ^entmodp)  then  {  my  turn  to  generate  next  HH  vector  ) 
lent*—  lent  +  1; 

(  complete  enough  of  H{u)  update  to  determine  next  pivot  column  ) 
z  *-  e{gent:m,lcnt:coUk)^u; 

e{gent-\-l,lent:eol3k)  *-  e{gent-\-ltlentieolsk)  -  2u(2:m-pcnt+l)  z; 
reSi  4—  yJresi-C{gcTU^iy  ,i  G  {lent,  eolsk} 

Let  pvt  G  {lent,  . . .,  eolsk}  be  such  that  rezjnt  is  maximal 
(  guarded  pivoting  strategy  will  be  inserted  here  ) 
e{gent:m,pvt)  *-  e{gent:m,pvt)  -  2u(l)  z^; 
u  4—  genhh{C{gent  +  l  :m,pvt)};  gent  *—  gent  +  1; 
if  {gent  <  n)  then  send  u  to  Pright  end  if 

e{gcnt  :m,  lent:  pvt -1)  *-  e{gent:  m,  lent:  pvt -\)  -  2u{2:m-gent  +  2)z^  ] 
{  complete  H(u)  update  ) 

e{gent:m,pvt+l:  eohk)  *-e{gent:  ni,pvt  +  l:  eolsk) -  2  u{2:m- gent+2)z^ 
{  complete  H(u)  update  ) 

e{:,pvt)  4-4  c(:,  lent)  ■, permit  4-*  permit ;  re3j„,t  *-  resi^  ; 
e{gcnt :  m,  lent + 1 :  co/s* )  4-  apphh{u,  e{gent,  lent + 1 :  cof  ) ; 
else  {  apply  H{u)  update  ) 

c( gent :  m ,  lent + 1 :  col )  4—  apphh{u,  C {gent :  m ,  lent  + 1 :  col ) ) ; 

end  if 

resj  4—  yj res?  -  c{gent,  »)* ,  i  G  (lent  +  1, . . . ,  eohk) 

end  while 


Figure  1:  The  Pipelined  QR  Algorithm  with  Local  Pivoting 
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returns  H{u)  A.  The  vector  perm  is  used  to  store  the  permutation  matrix.  If  perm{i)  =  /, 
then  the  Ith  column  of  A  has  been  permuted  into  the  tth  position.  The  vector  res  contains 
the  residuals  that  the  coliunns  not  yet  chosen  have  with  respect  to  the  other  columns  already 
selected.  To  save  space,  “HH”  is  used  as  an  abbreviation  for  “Householder”. 

It  is  worth  pointing  out  that  a  processor  that  has  to  generate  a  new  pivot  coliunn  com¬ 
pletes  only  as  much  of  the  previous  Householder  update  as  is  necessary  to  update  res  and  to 
determine  the  next  pivot  column.  This  is  important  since  we  wfint  to  avoid  that  other  nodes 
are  idle  waiting  for  a  new  Householder  vector  to  arrive. 

The  problem  with  the  strictly  local  pivoting  strategy  is  obviously  reliability.  As  a  patho¬ 
logical  example,  assume  that  all  columns  in  processor  1  are  nearly  equal.  As  a  result,  pro¬ 
cessor  1  will  make  bad  choices  after  it  has  generated  the  very  first  Householder  vector.  The 
resulting  upper  triangular  matrix  R  will  be  very  ill-conditioned  but  will  not  necessarily  have 
a  small  lower  right  hand  block.  So  in  order  for  the  local  pivoting  strategy  to  be  reliable,  we 
have  to  guard  against  choosing  nearly  dependent  pivot  columns. 


3  An  Incremental  Estimator  for  the  Smallest  Singular  Value  of  a  Triangular 

Matrix 

To  guard  ageiinst  choosing  “bad”  pivot  columns,  we  have  to  monitor  the  smallest  singular 
value  cr^iniRi)  where  Ri  is  the  leading  t  x  t  upper  triangular  matrix  generated  after  applying 
i  Householder  transformations  to  A.  The  exact  computation  of  by  inverse  iteration 

for  example  is  too  expensive,  especially  since  a  good  order-of-magnitude  estimate  suffices  for 
our  purposes. 

A  common  idea  underlying  condition  estimators  [5,6]  is  to  exploit  the  implication 


Rx^d 


1 

^miniR) 


\\d\\2 


by  generating  a  large  norm  solution  i  to  a  moderately  sized  right  hand  side  d  and  then  to 
use 


.(«) 


Mil 


as  an  estimate  for  <Tjnin{R)-  The  hope  is  that  x  will  be  an  approximate  singular  vector 
corresponding  to  the  smallest  singular  value  and  that  as  a  consequence  &min{R)  wiU  not  be 
too  much  of  an  over-estimate  of  J2).  Our  choice  of  algorithms  for  an  condition  estimator 
is  severely  restricted  by  the  fact  that  it  is  not  feasible  to  access  the  previously  generated  R 
when  we  want  to  decide  on  the  suitability  of  a  new  pivot  column.  To  be  more  precise,  given 
a  good  estimate  &rniniR)  defined  by  a  large  norm  solution  z  to  Rx  =  d  and  a  new  column 
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of  iZ,  we  want  to  obtain  a  large  norm  solution  y  to 


R‘y  = 


without  accessing  R  again.  None  of  the  condition  estimators  surveyed  by  High£im  [15]  has 
that  property,  but  the  two-norm  condition  estimator  suggested  by  Cline,  Conn  and  Van 
Loan  [5]  can  be  modified  to  conform  to  those  restrictions.  The  idea  then  is  the  following: 


Given  x  such  that  R^ 
maximized  where  y  = 


X  =  d  with  ||d||j  =  1,  find  s  :=  sin^  and  c  :=  cos^  such  that  IlytU  is 

( : ) 


(6) 


We  here  exploit  the  fact  that  R'  and  R'^  have  identical  singular  values.  An  easy  calculation 
shows  that  maximizing  ||j/||2  is  equivalent  to  maximizing 

=  s"*^  -  2asc  (7) 


where 

o  =  w^x  and  ^  ~  7*x^x  -|-  a*  -  1. 

Taking  derivatives  in  (7)  and  setting  tj  =  ^f{2a)  we  find  two  possible  solutions: 

1 


3i,j  = 


where 

The  corresponding  cosine  values  are 


V^l  +  Ml, 2 
^1.2  =  »?  ±  y/l  + 


Cl,2  -  ■»l,2/*l,2' 

To  choose  between  the  two  possibilites,  we  compute  ♦(si)  and  and  choose  the  sine/cosine 
pair  that  results  in  the  greater  value  for  #.  For  the  special  case  q  =  0  we  obtain  ci  =  1 ,  si  = 
0  and  cj  =  0  32  =  1  •  The  new  approximate  singular  vector  y  as  defined  by  (6)  is  then  given 
by  setting 

Jr  C  -  30 
z  :=  3X  wd  6  :=  - . 

7 
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The  resulting  estimate  for  the  smallest  singular  value  (Tmin{R')  of  R'  is 

»»*.(«')  =  ■ 

From  this  description  it  is  clear  that  this  condition  estimator  satisfies  our  algorithmic  con¬ 
straints.  Given  a  current  Ri  we  only  need  to  save  the  current  solution  z  and  its  norm  ||z{|3 
to  arrive  at  an  estimate  for  Furthermore  the  calculation  is  inexpensive.  For  a 

k  X  k  matrix  Ri  we  only  need  2k  flops  to  arrive  at  an  estimate  for  So  altogether 

it  costs  only  flops  to  run  this  condition  estimator  alongside  the  generation  of  an  n  x  n 
triangtilar  matrix. 

To  assess  the  accuracy  of  our  condition  estimator,  we  performed  the  suite  of  tests  sug¬ 
gested  by  Higham  [15].  Three  different  types  of  test  matrices  are  employed.  In  each  test, 
upper  triangular  matrices  R  were  generated  by  computing  the  QR  factorization  of  various 
n  X  n  matrices  A  for  n  =  10, 25, 50  both  with  and  without  column  pivoting. 

Testl  (see  Table  1):  The  elements  of  A  were  chosen  as  random  numbers  from  the  uniform 
distribution  on  [- 1,  Ij.  Fifty  matrices  were  generated  for  each  n.  As  observed  by  Higham,  this 
type  of  matrix  usually  is  well-conditioned.  Over  the  whole  test  the  minimum,  maximum  and 
average  values  of  the  two-norm  condition  number  »ca(j4)  =  ffi/ffn  were  21, 1.4-10^  and  2.0-10® 
respectively. 

Test2(see  Tables  2  and  3)  and  Test  3:  In  these  tests  we  used  random  matrices  A  with 
preassigned  singular  value  distributions  {ffi}.  Random  orthogonal  matrices  U  and  V  were 
generated  using  the  method  of  Stewart  [22]  and  then  A  was  formed  as  in  (5).  For  each  value 
of  n  and  each  sing\ilar  value  distribution,  fifty  matrices  were  generated  by  choosing  different 
matrices  U  and  V.  For  test  2  we  chose  the  exponential  distribution 

Oj  =  a’ ,  1  <  *  <  n 

where  a  is  determined  by  K2(i4).  For  test  3,  we  chose  the  sharp-break  distribution 

1 

The  figures  given  in  Tables  1-3  are  the  ratios 

^mm{R)/<rmin{R)  >  1 

The  first  number  in  each  pair  is  the  maximum  ratio  over  the  fifty  matrices  and  the  second  is 
the  average  ratio.  All  results  were  rounded  to  two  significant  digits.  For  Test  3  we  observed 
a  ratio  of  1.0  (i.e.  the  estimate  had  at  least  two  correct  figures)  in  all  cases.  These  results 
show  that  our  conditon  estimator  produces  indeed  good  estimates.  We  overestimate  (TminiR) 
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Table  1:  Resvlta  of  Test  1 


pivoting 

n  =  10 

25 

50 

no 

mwngm 

7.0/3.1 

yes 

3.9/2.6 

Table  2:  Results  of  Test  2  without  Pivoting 


K2 

n  =  10 

25 

50 

10 

1.8/1.3 

1.7/1.4 

1.6/1.4 

10^ 

3.0/1.9 

2.5/2.0 

3.2/2.2 

10® 

8.1/1,9 

6.3/2.6 

4.2/2.8 

10® 

6.1/2.2 

5.9/3.0 

5.2/3.2 

Table  3:  Results  of  Test  2  rvith  Pivoting 


n  =  10 

25 

50 

10 

1.6/1.3 

1.6/1.4 

1.7/1.4 

10® 

2.2/1.5 

2.3/1.8 

10® 

2.8/1.5 

3.4/2.1 

3.4/2.5 

10® 

2.4/1.6 

3.3/2.2 

4.3/2.7 
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only  by  a  small  factor  and  the  results  vary  only  little  with  condition  number,  matrix  size 
and  singular  vedue  distribution.  Pivoting  increases  the  accuracy  of  the  condition  estimator 
and  we  can  confidently  expect  similar  accuracy  when  applying  this  estimator  to  matrices  R 
generated  by  the  local  pivoting  strategy. 


4  The  QR  Algorithm  with  Controlled  Local  Pivoting 

With  the  condition  estimator  we  now  have  the  tool  to  insure  the  numerical  stability  of  the 
local  pivoting  strategy.  Using  the  same  notation  as  in  the  algorithm  of  Figure  1  processor 
k  now  can  check  whether  c(:,j)  is  a  reasonable  choice  for  the  next  pivot  column  before 
computing  u.  Assuming  that  processor  k  knows  the  current  estimate  z  as  well  as  ||z||2  for 
the  current  upper  triangular  matrix  Rgmi,  all  that  is  needed  for  the  next  condition  estimator 

step  is  the  last  column  ^  ^  of  Rgeni+i-  But 

w  =  c(l  :  gcnt,j) 

has  already  been  computed  and  from  the  definition  of  u  and  res  it  follows  inunediately  that 


7  =  -sign(c(ffcnt  +  l,j))resj. 


So  all  the  information  for  the  next  condition  estimator  step  is  readily  at  hand  and  we  can 
compute  a  new  approximate  singular  vector  y  for  Agcnt+i- 
With 

Q  =  mM  llo^lla 

!<»<" 

being  the  norm  of  the  largest  coliunn  of  A,  we  then  take 

1 


d’m»ti(-Sgenl+l )  — 


mliyIU 

as  an  estimate  for  the  smallest  singular  value  of  Rgcnt+\  and 

a(fiflciu+i)  =  »72Q||y||2 


(8) 


(9) 


as  an  estimate  for  the  true  condition  number  of  Rgcnt+\-  The  scaling  factors  t)  reflect  the 
trust  we  have  in  the  accuracy  of  our  estimates.  Based  on  the  numerical  results  of  section  3 
we  reconunend  t/i  =  3  and  tjj  =  10.  The  choice  of  t/2  reflects  the  fact  that  in  genersd  the 
norm  of  the  largest  colunm  is  a  good  estimator  for  the  largest  singular  value  of  a  matrix. 

Comparing  the  estimates  (8)  or  (9)  against  a  chosen  threshold  we  will  then  accept  or 
reject  a  candidate  pivot  column.  The  exact  threshold  depends  heavily  on  the  application,  in 
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particular  the  accuracy  of  the  initial  data.  If  the  data  is  accurate  to  machine  precision  e,  a 
candidate  pivot  will  in  general  be  rejected  if  ^min  {Rgent+i)  =  0{l/e). 

If  the  candidate  pivot  column  is  rejected,  processor  k  has  exhausted  its  supply  of  “reason¬ 
able”  columns  and  from  then  on  it  will  only  apply  Householder  vectors  generated  by  other 
processors  to  its  remaining  columns.  If  on  the  other  hand  we  accept  the  candidate  pivot 
column,  then  processor  k  will  actually  compute  u,  send  (u,  y,  Hylls)  to  its  right  neighbour  and 
then  proceed  as  in  Figure  1.  It  should  be  noted  that  y  and  ||y||2  have  to  be  forwarded  only 
to  the  processor  that  will  generate  the  next  Householder  vector,  while  u  will  eventually  be 
known  to  all  processors.  So  the  propagation  of  the  condition  estimator  results  will  result  in 
only  a  minor  increase  in  data  traffic. 

This  scheme  continues  until  no  processor  has  any  acceptable  pivot  candidate  left.  As¬ 
suming  that  altogether  we  generated  n  =  n  -  f  Householder  vectors,  we  have  at  this  point 
computed  the  incomplete  QR  factorization 

AP  =  (Q,  e0("o”  *r) 

where  Qi  is  m  x  n,  Qj  is  m  x  (m  -  n  1)  and  Q  =  [QijCJa]  is  orthogonal.  Rn  is  upper 
triangular  of  size  n  x  n  and  A  is  of  size  (m  -  f  -I- 1)  x  f.  Our  controlled  pivoting  strategy  gives 
us  an  estimate  for  (Tniinlfiu)  and  further  we  know  that  adding  any  of  the  leftover  f  columns 
of  AP  would  result  in  a  decrease  of  the  smallest  singular  value  below  our  chosen  threshold. 
So  we  have  good  reason  to  assume  that  f  is  the  dimension  of  the  numerical  null  space  of  A. 
Then  we  can  set  A  in  (10)  to  zero  and  use  the  resulting  tnmcated  QR  factorization  to  solve 
the  least  squares  problem  (1). 

5  Numerical  Experiments 

To  assess  the  numerical  behavior  of  the  proposed  local  pivoting  scheme,  we  ’simulated  the 
parallel  algorithm  using  PRO-MATLAB  and  compared  it  with  the  traditional  QR  factoriza¬ 
tion  algorithm  with  global  column  pivoting.  Various  50  x  50  matrices  were  generated  and 
the  local  pivoting  strategy  simulated  on  8  processors. 

For  tests  1  to  3  we  generated  50  random  matrices  for  each  singular  value  distribution 
{(7i}.  For  all  matrices  the  largest  and  smallest  singular  values  were  1  and  10~^  respectively. 

Break  1  Distribution:  Oi  =  ...  =  =  1;  0-50  =  10~*  . 

Break  9  Distribution:  <Ti  =  . . .  =  cr^i  =  1;  <^42  =  . . .  =  (750  =  10~®  . 

Exponential  Distribution:  =  q*;  a  =  (10“*)~«  . 
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Table  4:  uiiu/avg/max  Values  of  the  Condition  Numbers  of  R  using  Local  and 
Global  Pivoting 


Distribution 

break  1 

break  9 

exponential 

2.8  /4.8  /9.5 

4.0  /12  /180 

5.0e6  /8.8e7  /1.9e7 

^trad{R) 

2.1  /2.7  /3.6 

3.4  /4.4  /6.1 

4.2e6  /7.2e6  /1.3e7 

•ioptiR) 

1.0 

1.0 

9.5e6 

Setting  the  rejection  threshold  for  the  smallest  singular  value  to  10~^  and  discounting  the 
estimate  for  the  smallest  singular  value  (8)  by  a  factor  of  =  3  we  reject  a  candidate  pivot 
column  in  the  parallel  algorithm  if 

ll^ll  =  ^m«n(^Bcnt+l)  <  3  •  10 

For  the  traditional  QR  factorization  algorithm  we  use  the  last  diagonal  entry  of  Rgent+i  ^ 
estimate  and  reject  a  candidate  pivot  column  if 

I  faerU+lijicnt+l  |  ^  3  •  10 

Table  4  shows  the  condition  number  k{R)  of  the  upper  triangular  matrices  R  generated  by 
controlled  local  pivoting  amd  by  traditional  column  pivoting  on  those  matrices.  Letting  <Tcut 
be  the  smallest  singular  value  greater  than  10~^  then  the  optimal  value  we  can  achieve  for 
k{R)  is  Kapt{R)  =  1/o’cut-  Furthermore  let  KpariR)  be  the  condition  number  resulting  form 
the  parallel  scheme  and  Ktrad{R)  Ibe  condition  number  resulting  from  the  traditional  column 
pivoting  scheme.  For  Kpar(R)  and  Ktrad(R)  observed  minimum,  average  and  maximtun  values 
are  displayed.  These  results  show  that  guarded  local  pivoting  is  about  as  effective  as  full 
column  pivoting  in  generating  a  well-conditoned  R  —  even  if  the  number  of  local  coliunns  is 
fairly  small. 

For  the  sharp  break  distributions  there  is  a  well-defined  gap  between  the  singular  val¬ 
ues  before  and  after  the  acceptance  threshold  and  both  local  and  global  column  pivoting 
identify  the  numerical  nullspace  correctly  in  all  cases.  As  already  pointed  out  earlier,  the 
determination  of  numerical  rank  becomes  problematic  if  there  is  no  well-defined  gap  between 
singular  values  that  are  considered  “large”  and  “small”.  The  exponential  distribution  is  such 
a  problematic  case.  There  are  39  singular  values  that  are  larger  than  but  there  is  no 
well-defined  break.  To  be  exact: 

<T37  =  2.4  •  10  (Tsg  =  1.6  •  10  (Tsg  =  1.04  •  10  ^and  (T^q  =  6.8  •  10  *  . 
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Table  5:  Frequency  of  Accepting  Columns  for  the  Exponenticd  Distribution 


No.  of  columns  accepted 

36 

37 

38 

local  pivoting 

14 

30 

6 

global  pivoting 

5 

42 

The  column  pivoting  strategy  reflects  this  difficulty  in  accepting  less  than  39  columns  and  the 
observed  results  are  displayed  in  Table  5.  It  shows  that  even  for  an  ill-defined  problem  the 
guarded  local  pivoting  scheme  is  very  reliable  in  that  it  leans  towards  a  small  underestimate 
of  the  dimension  of  the  range  space  of  A. 

Our  last  example  shows  the  advantage  resulting  from  integrating  the  incremental  condi¬ 
tion  estimator  with  the  global  pivoting  scheme.  Let 


f: 


A„  =  diag(l,3,s’ 


-n-li 


-C 

1 


V 


0  •  • 


— c 


\ 


-t-  diag(n€,(n-  l)e,...,€) 


1  -c 
0  1  / 


where  -f-  =  1  and  e  is  the  machine  precision.  i4„  is  very  ill-conditioned,  but  although 

each  leading  principal  submatrix  Ai,{k  <  n)  is  tdso  ill-conditioned,  there  is  a  well-defined  gap 
between  <t„  and  <Tn-i.  As  an  example  we  have  <749  =  1.2  •  10“^  and  <750  =  3.7  •  10“^*  for 
n  =  50  and  c  =  0.5.  This  is  a  well-known  example  where  the  QR  factorization  with  colunrn 
pivoting  fails  since  even  in  floating  point  arithmetic  the  matrix  is  its  own  QR  factorization 
but  no  trailing  block  of  R  is  small  to  reveal  its  Ul-conditioning.  In  this  example  both  the  local 
and  global  pivoting  schemes  select  the  the  columns  in  their  natural  order.  However  the  incre¬ 
mental  condition  estimator  integrated  into  the  parallel  scheme  detects  the  ill-conditioning  of 
the  leading  principal  submatrices  Ai,  —  it  never  overestimates  the  smallest  singular  value  by 
a  factor  of  more  than  1.5.  So  while  the  column  pivoting  scheme  fedls  the  incremental  condi¬ 
tion  estimator  insures  that  failure  will  not  go  unnoticed.  Given  its  negligible  extra  cost  this 
suggest  the  usefulness  of  incorporating  the  incremental  condition  estimator  into  the  global 
column  pivoting  scheme. 

The  matrix  An  is  also  an  example  where  the  local  pivoting  scheme  performs  better  than 
the  global  one.  Let  A^o  be  the  same  matrix  as  Aso  except  that  the  order  of  columns  has  been 
reversed.  For  the  global  column  pivoting  scheme  this  permutation  is  without  consequences 
and  it  fails.  The  parallel  scheme  simulated  on  8  processors  on  the  other  hand  correctly 
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identifies  the  numerical  nuUspace  of  Aso-  While  this  is  an  exceptional  occurence  due  to  the 
special  structure  of  i4„,  it  is  nonetheless  surprising  since  intuitively  one  would  expect  the 
global  pivoting  strategy  to  always  perform  better  tham  the  local  one. 

6  Conclusions 

This  paper  presented  a  new  variant  of  the  Householder  QR  algorithm  with  column  piv¬ 
oting.  In  that  context  we  introduced  a  new  incremental  condition  estimator  that  allowed  us 
to  update  the  estimate  for  the  smallest  singular  value  of  the  upper  triang\ilar  matrix  R  as 
new  columns  were  added  to  R.  The  update  required  only  0(n)  flops  and  the  saving  of  0{n) 
words  between  successive  steps.  Despite  its  small  computational  cost,  experiments  with  a 
variety  of  matrices  demonstrated  the  reliability  of  the  condition  estimation  algorithm.  This 
condition  estimator  made  it  possible  to  implement  a  strictly  local  pivoting  scheme  for  the 
QR  factorization  by  guarding  against  an  improper  choice  of  pivot  colunms.  Numerical  exper¬ 
iments  show  that  the  local  pivoting  scheme  performs  by  and  large  as  well  as  global  pivoting. 
There  even  exist  cases  where  the  local  pivoting  scheme  succeeds  while  the  globaJ  pivoting 
scheme  fails. 

We  also  gave  an  example  showing  the  usefulness  of  integrating  the  incremental  condition 
estimator  with  the  traditional  global  column  pivoting  strategy.  The  flops  for  the  condition 
estimator  might  be  a  worthwhile  investment  to  guard  agmnst  the  pathological  cases  that  are 
not  revealed  by  the  traditional  QR  factorization  edgorithm  with  column  pivoting. 

We  are  currently  investigating  the  effect  of  a  dynamic  threshold  for  the  acceptance  or 
rejection  of  a  pivot  coliunn.  Starting  with  a  relative  conservative  threshold  and  relaxing  it 
as  the  computation  proceeds  is  likely  to  result  in  better  conditioned  leading  submatrices  of 
R.  The  penalty  is  a  possible  loss  of  effiency  as  processors  might  have  to  reconsider  the  same 
column  as  pivoting  candidate  at  a  later  stage  of  the  algorithm. 
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ABSTRACT 

In  this  paper,  we  present  a  parallel  algorithm  for  the  solution  of  systems  of  non¬ 
linear  equations.  The  algorithm  is  primarily  based  on  the  serial  nonlinear  Jacobi  algo¬ 
rithm.  Different  parallel  implementations  are  discussed.  In  particular,  a  block  form  is 
presented  for  the  case  when  the  number  of  processors  is  small  in  comparison  to  the 
number  of  variables.  A  straightforward  implementation  is  given  for  the  solution  of  un¬ 
constrained  minimization  problems.  Some  numerical  experiments  run  on  an 
Encore/Multimax  with  20  processors  are  presented. 
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l.Introduction.  In  this  paper  we  present  a  parallel  implementation  of  the  serial 
nonlinear  Jacobi  algorithm  for  the  solution  of  systems  of  nonlinear  equations.  The  serial 
algorithm  is  a  particular  case  of  the  more  general  SOR-Newton  algorithms.  The  algo¬ 
rithm  is  based  on  the  same  idea  as  the  Jacobi  algorithm  for  solving  linear  systems  of 
equations  and  thus,  it  suffer^  from  the  same  drawback  as  its  counterpart  for  linear  sys¬ 
tems,  namely,  its  slow  rate  of  convergence.  The  serial  algorithm  was  first  presented  by 
Wegge  [23],  later  analyzed  by  Schechter  [20,21]  and  Voigt  [22]  and  the  most  recent  imple¬ 
mentation  given  by  Dennis  and  Walker  [6].  For  a  detailed  overview  of  SOR-Newton 
methods  see  Ortega  and  Rheinboldt  [17]. 

The  algorithm  was  discarded  as  a  viable  way  of  solving  nonlinear  systems  of  equa¬ 
tions  and  replaced  by  more  efficient  methods  such  as  Newton-like  methods.  However, 
these  latter  methods  do  not  lend  themselves  in  a  straightforward  manner  to  a  parallel 
environment.  Consider,  for  instance,  Newton’s  method  for  solving 

F{x)  =  0  with  F:  R”  — >•  R"  .  (1.1) 

The  iterative  scheme  follows; 

Step  0.  Get  .  Set  k=0. 

Step  1.  Solve  F*{xif)sif  =  —  F{xif)  (1.2. a) 

Step  2.  L'pdate  Xk+i—  (1.2.b) 

Step  3.  Set  A:=  A:-l- 1.  Go  to  step  1. 

The  linear  system  of  equations  (1.2.a)  that  arises  at  each  iteration  could  be  solved  in 
parallel;  see  for  instance  [19|.  However,  the  amount  of  message  passing  involved  at  each 
iteration  remains  a  bottleneck.  Moreover,  if  the  initial  guess  is  far  away  from  the  solu¬ 
tion,  a  globally  convergent  technique  such  a  line  search  or  a  trust  region  must  be  imple- 
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mented.  Such  global  techniques  must  be  run  in  parallel,  otherwise  several  processors  will 
be  idle  while  such  computation  occurs. 

Other  parallel  algorithms  have  been  proposed  in  the  literature  for  solving  (1.1).  The 
one  dimensional  case  n=l  have  been  analyzed  in  [7]  and  [14].  Different  parallel  methods 
were  proposed  where  the  emphasis  resided  in  doing  concurrent  function  evaluations.  In 
[l],  Baudet  presents  an  excellent  study  of  asynchronous  iterative  methods  for  multipro¬ 
cessors.  He  uses  the  contraction  mapping  iteration  and  present  the  convergence  criteria 
for  these  methods.  However,  his  numerical  experiments  were  performed  on  linear  func¬ 
tions  only.  Bojanczyk  in  (2)  uses  an  asynchronous  Newton  method  where  the  function 
f{x)  and  the  Jacobian  of  Fare  calculated  in  parallel.  Since  the  Jacobian  evaluation  takes 
much  longer  than  the  function  evaluation,  Newton  steps  are  taken  using  a  fixed  Jacobian 
until  the  processor  calculating  the  new  Jacobian  finishes.  He  shows  that  this  parallel 
method  will  be  at  most  four  times  faster  than  the  serial  method  no  matter  how  many 
processors  are  used  in  the  computations.  More  recently.  White  [24]  presented  a  parallel 
nonlinear  Newton-SOR  algorithm.  In  there,  the  main  iteration  is  the  Newton  iteration 
for  solving  (1.1),  then  the  linear  system  is  solved  using  the  Gauss-Seidel  method  with  the 
multi-splittings  techniques  developed  in  [15|.  He  shows  convergence  of  the  method  and 
presents  some  numerical  results  on  a  serial  computer. 

The  algorithm  we  are  proposing  lends  itself  in  a  straightforward  fashion  to  a  paral¬ 
lel  implementation.  In  particular,  the  bigger  the  dimension  of  the  problem  the  higher  the 
speedup  that  can  be  attained.  The  main  characteristics  of  the  parallel  algorithm  are  the 
following.  Firstly,  one  need  not  solve  a  linear  system  of  equations  at  each  iteration  if  the 
dimension  of  the  problem  is  less  than  or  equal  to  the  number  of  processors  available.  If 
the  dimension  of  the  problem  is  bigger  than  the  number  of  processors  available,  then 
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small  systems  of  linear  equations  are  solved  in  each  processor.  In  this  way  the  message 
passing  is  decreased  considerably  at  each  iteration.  Secondly,  a  globally  convergent  tech¬ 
nique  can  be  easily  implemented  in  parallel  since  each  processor  is  solving  a  different  sys¬ 
tem  of  nonlinear  equations.  Moreover,  such  global  procedure  need  not  be  the  same  in 
each  processor;  a  line  search  approach  could  run  in  certain  processors  while  a  trust  region 
could  run  on  others.  Another  important  feature  of  the  parallel  algorithm  is  that  functions 
evaluations  are  implicitly  done  in  parallel.  Thus,  considerable  savings  are  obtained  over  a 
serial  solver. 

The  main  drawback  of  the  serial  algorithm  is  its  slow  rate  of  convergence,  linearly 
convergent.  With  the  numerical  results  obtained  we  show  that  this  drawback  can  be  cir¬ 
cumvented  using  a  parallel  implementation. 

The  work  in  this  paper  is  presented  in  the  following  fashion.  In  Section  2,  we 
describe  the  serial  algorithm  along  with  its  main  convergence  results.  In  Section  3  dif¬ 
ferent  parallel  implementations  of  the  serial  algorithm  are  presented.  Emphasis  is  given 
to  the  case  where  the  dimension  of  the  problem  is  bigger  than  the  number  of  processors 
available.  In  Section  4,  we  briefly  discuss  the  use  of  the  parallel  algorithm  to  solve  uncon¬ 
strained  minimization  problems.  In  Section  5  some  numerical  results  obtained  on  the 
Encore/Multimax  located  at  Argonne  National  Laboratory  are  presented.  Finally,  in  Sec¬ 
tion  6  we  present  future  work  and  draw  some  conclusions. 

2.  The  serial  nonlinear  Jacobi  algorithm.  Consider  the  following  system  of 
nonlinear  equations 

Fix)  =  0  (2.1) 

where  F:  R"  — >■  R"  is  a  continuously  differentiable  function  in  an  open  subset  Q  of  R”. 
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We  will  assume  there  exist  such  that  F{x»)=0  and  exists. 

The  nonlinear  Jacobi  algorithm  is  a  particular  case  of  a  more  general  class  of  algo¬ 
rithms,  the  nonlinear-SOR  algorithms.  These  algorithms  are  based  in  the  following  idea. 
A  basic  step  of  the  nonlinear  Gauss-Seidel  iteration  is  to  solve  the  ith  equation 


. . 4*’)  -  0 


(2.2) 


for  z,  and  to  set 


,(*) 

•‘‘t 


(2.3) 


where  m,-  correspond  to  the  number  of  inner  steps  performed  in  solving  (2.2).  Thus,  in 
order  to  obtain  i**"*"'*  from  we  solve  successively  the  n  one-dimensional  nonlinear 
equations  (2.2)  for  i=l,...,n.  More  generally,  we  may  set 

=  xW  +  -  xW)  (2.4) 


in  order  to  obtain  a  nonlinear  SOR  method  where  is  a  parameter  varying  with  k. 

In  an  analogous  fashion,  the  kik  stage  of  the  nonlinear  Jacobi  iteration  may  be 
defined  by  solving  the  equations 


liw, 


4*))=0  f=l,...,n 


(2.5) 


for  z  and  setting  for  i=l,,..n. 

Notice  that  the  above  methods  have  meaning  only  if  the  equations  (2.2)  or  (2.5) 
have  unique  solutions  in  some  specific  domain  under  consideration.  Conditions  must  be 
given  to  ensure  that  this  is  true. 


We  now  restrict  ourselves  to  the  nonlinear  Jacobi  iteration.  The  iterative  method 
used  to  solve  (2.5)  plays  the  role  of  a  secondary  iteration,  while  the  Jacobi  (SOR)  method 
is  the  primary  iteration.  There  are  different  ways  of  implementing  the  algorithm  based 
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on  the  number  of  inner  steps  m,-  taken  to  obtain  the  solution  of  (2.5)  and  the  iterative 
method  used  to  solve  (2.5).  Namely,  if  m,=  l,  only  one  step  is  carried  out  to  obtain 
from  Ip)  and  if  Newton’s  method  is  used  to  solve  (2.5),  we  get  the  one-step  Jacobi- 
Newton  method.  In  this  case,  ip'''^)  is  given  by 


.(*+1)  ^  -(*)  _ 


6p) 


(2.6) 


d  f 

where  6p)= -r— ^(i^*))  and  i**)=(ip),  .  .  .  ,ip)).  Notice  that  the  starting  point  for 

oi,- 

Newton’s  method,  and  for  any  other  iterative  method  used  to  solve  (2.5),  is  2p)=ip).  It 
is  worth  noticing  that  the  one-step  Jacobi-Newton  method  generates  the  same  iterates  as 
the  one-step  Newton-Jacobi  method  in  which  Newton’s  method  is  used  to  solve  (1.1)  and 
one  step  of  the  Jacobi  algorithm  is  used  to  solve  the  linear  system  (1.2. a). 

One  can  also  use  a  secant  method  for  solving  (2.5)  as  suggested  by  Wegge  [23]  and 
obtain  the  one-step  Jacobi-secant  iteration.  We  just  substitute  the  partial  derivative  in 
(2.5)  by 


6i*)  = 


-  /,(!<»  + 

xp)  _  xp-^) 


(2.7) 


where  e,  denotes  the  ith  column  of  the  identity  matrix.  One  could  also  use  the  more 
recent  secant  implementation  proposed  by  Dennis  and  Walker  [6]  in  which  6p)  is  allowed 
to  be  equal  to  6p"^)  in  some  particular  instances. 

Theoretically,  more  than  one  inner  step  when  solving  (2.5)  does  not  improve  the 
rate  of  convergence  of  the  algorithm  (see  [22{).  Our  numerical  experiments  show  this  is 
the  case  in  the  majority  of  the  test  problems.  However,  for  certain  problems  we  obtained 
faster  convergence  by  using  more  than  one  inner  step. 
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We  now  present  an  informal  view  of  two  local  convergence  results  for  the  serial  non¬ 
linear  Jacobi  algorithm.  The  first  one  can  be  found  in  [17,  Theorem  10.3.5],  the  second 
one  is  due  to  Dennis  and  Walker  [6,  Theorem  4.1].  For  both  we  assume  z,,  is  in  a  neigh¬ 
borhood  of  z*  and  F  satisfies  the  standard  conditions  stated  at  the  beginning  of  this  sec¬ 
tion.  Let 

P[x)  =  D{x)  -  L(z)  -  U[x)  (2.8) 

be  the  decomposition  of  F*{x)  into  its  diagonal,  strictly  lower-,  and  strictly  upper- 
triangular  parts  and  assume  that  D*=D{x*)  is  nonsingular.  Let  U»=U{x*)  and  Z,*=:L{z»). 
The  first  result  treats  the  general  nonlinear  Jacobi  iteration  where  (2.5)  is  solved  by  no 
specific  method.  The  result  says  essentially  that  if  p{I  —  Di'^P{x*))  <  1,  then  the 
sequence  x^  converges  to  z«  r-linearly.  The  second  result,  which  deals  with  the  one-step 
Jacobi-secant  iteration,  states  that  if  /»(/— £>r*i^(z»))  <  1,  then  the  sequence  z*  con¬ 
verges  to  z«  q-linearly. 

.A.n  important  feature  of  the  nonlinear  Jacobi  (SOR)  method  is  that  it  can  be 
extended  to  block  form.  Partition  z  as  z=(z^  .  .  .  .z'”),  with  z’ER^',  and  group 
correspondingly,  the  components  /,  of  F  into  mappings  F,  :  R”— ►  R^’,  for  i=l,...,m.  Then 
solving 

. (i-‘)‘,(j),(x'+  ‘)‘ _ (x”)*)  =  0  ,  i=  1 m 

for  (z)  describes  a  nonlinear  block  Jacobi  process  in  which  a  complete  iteration  requires 
the  solution  of  m  nonlinear  systems  of  dimension  i=l,...,m.  This  approach  will  be  used 
extensively  in  the  parallel  implementation  to  follow. 
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3.  Parallel  implementation.  For  the  purpose  of  this  presentation  we  can  assume 
a  parallel  computer  with  or  without  a  shared  memory,  with  p  processors  and  each  proces¬ 
sor  able  to  sustain  a  heavy  load  of  floating  point  computations.  Such  is  the  case  of  the 
most  current  parallel  computers  such  as  the  N-Cube,  Encore/Multimax,  the 
Sequent/Balance,  the  BBN  Butterfly  and  the  Alliant  to  name  just  a  few.  We  must  also 
assume  that  a  way  of  transferring  data  among  processors  exist,  such  as  the  Monitor  Sys¬ 
tem  [I2j,  the  Domino  System  [I6j  or  the  DPUP  System  [lOj  among  others. 

The  parallel  implementation  of  the  nonlinear  Jacobi  algorithm  is  straightforward 
from  (2.5).  Each  processor  will  be  assigned  an  index  i  and  for  such  index  it  will  have  the 
task  of  computing  the  solution  of  (2.5).  In  this  way,  the  parallel  implementation  allows 
the  user  to  work  with  a  different  iterative  method  to  solve  (2.5)  on  each  processor;  the 
method  may  be  a  secant  or  a  Newton  method.  Moreover,  if  in  (2.5)  we  are  solving  a  one¬ 
dimensional  problem,  one  may  use  a  bisection  method  combined  with  Newton’s  or  secant 
method,  such  as  Brent’s  algorithm  [3].  The  global  technique  to  ensure  convergence  when 
far  away  from  the  solution  may  also  vary  among  processors  and  the  number  of  inner 
steps  m,  used  for  each  subproblem  (2.5)  may  also  vary.  In  this  way,  the  parallel  algo¬ 
rithm  gives  the  user  great  flexibility  in  deciding  which  implementation  to  use  for  a  partic¬ 
ular  problem. 

If  we  try  to  solve  (2.5)  on  each  processor  to  a  given  tolerance,  we  might  get  idle  pro¬ 
cessors  waiting  for  the  most  nonlinear  functions  to  converge.  In  order  to  deal  with  this 
problem,  a  fixed  number  of  inner  steps  is  allowed  in  each  processor.  In  this  way,  a  proces¬ 
sor  stops  computing  if  either  a  given  tolerance  is  reached,  or  if  the  the  given  number  of 
inner  steps  is  attained.  It  is  worth  noticing  that  in  some  of  the  test  problems  we  obtained 
better  convergence  results  by  using  more  than  one  inner  step. 
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The  most  interesting  case  to  consider  is  when  p  <1;  n.  In  this  case  we  will  solve  (1.1) 
by  using  the  nonlinear  block  Jacobi  algorithm  as  presented  in  the  previous  section.  We 
evenly  load  all  the  processors  with  a  partition  of  the  components  /,•  of  F  (say  F,).  In  this 
way,  each  processor  will  solve  a  nonlinear  system  of  equations,  thus  diminishing  the  over* 
head  created  by  the  communication  among  processors  since  each  processor  will  perform  a 
considerable  amount  of  floating  point  computation.  The  dimensions  of  each  subsystem 
may  be  different.  At  this  point,  one  can  use  a  standard  serial  nonlinear  solver  in  each 
processor  (i.e.,  MINPACK).  Once  more  a  fixed  number  of  inner  steps  might  be  appropri¬ 
ate  to  avoid  having  idle  processors  at  each  iteration. 

4.  Numerical  experiments.  All  the  experiments  were  performed  in  an  Encore/ 
Multimax  located  at  Argonne  National  Laboratory  at  the  Advanced  Computing  Research 
Facility.  The  Encore /Multimax  has  20  processors  with  20  Mbytes  of  memory.  Each  pro¬ 
cessor  is  a  National  Semiconductor  32032  chip  set  running  at  10  MHz.  The  processors  are 
connected  via  a  64-bit  wide  bus  with  a  data  transfer  of  100  Mbytes  per  second.  The 
operating  system  is  UNIX^*^. 

As  a  synchronization  and  communication  system  among  processors  we  used  the 
FORTRAN  version  of  the  Monitors  macros  developed  by  Lusk  and  Overbeek  [12].  This 
system  allows  one  to  set  up  a  pool  of  tasks  which  are  solved  in  parallel  by  the  processors. 

The  test  problems  we  used  were  selected  from  the  standard  set  of  problems  in  Gar- 
bow,  Hillstrom  and  More  [9|.  There  are  fourteen  problems  which  are  systems  of  nonlinear 
equations.  They  are  presented  with  numbers  from  1  to  14.  We  have  kept  the  same 
numeration  to  denote  those  problems.  They  are; 
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1.  Rosenbrock  function 

2.  Powell  singular  function 

3.  Powell  badly  scaled  function 

4.  Wood  function 

5.  Helical  valley  function 

6.  Watson  function 

7.  Chebyquad  function 


8.  Brown  almost-linear  function 

9.  Discrete  boundary  valued  function 

10.  Discrete  integral  equation  function 

11.  Trigonometric  function 

12.  Variably  dimensioned  function 

13.  Broyden  tridiagonal  function 

14.  Broyden  banded  function. 


The  first  five  test  functions  are  of  dimensions  2, 4, 2, 4, 3,  respectively,  while  the 
remaining  test  functions  are  of  variable  dimension.  For  more  information  on  these  prob¬ 
lems  see  [9].  For  problems  with  variable  dimension  we  decided  to  run  them  with  dimen¬ 
sions  4,  8,  16,  and  32.  To  problem  number  5,  the  Helical  valley  function,  we  added  one 
extra  function,  to  get  convergence  with  the  nonlinear  Jacobi  method.  The  non¬ 

linear  Jacobi  method  was  unable  to  solve  problems  6,7,8  and  12. 

We  decided  to  compare  the  parallel  nonlinear  Jacobi  algorithm  with  the  best  non¬ 
linear  equation  solver,  MINPACK.  Minpack’s  algorithm  is  well  suited  for  solving  small 
and  medium  size  problems  with  expensive  function  evaluations.  The  set  of  test  problems 
chosen  are  of  small  and  medium  size,  however,  their  functions  are  not  expensive  to  evalu¬ 
ate.  On  the  other  hand,  the  nonlinear  Jacobi  algorithm  was  designed  for  large  problems 
and  therefore,  its  parallel  implementation  will  not  perform  as  well  in  this  particular  set  of 
problems.  We  must  keep  in  mind  that  the  main  purpose  of  the  numerical  experiments  is 
to  study  different  parallel  implementations  of  a  linearly  convergent  algorithm  and  the 
comparison  of  its  performance  against  a  quadratically  convergent  algorithm  such  as  the 
one  in  MINPACK.  It  is  not  our  intention  to  claim  that  our  algorithm  is  superior  to  the 
MIN’P.^CK  algorithm.  The  numerical  results  will  allow  us  to  pinpoint  synchronization 
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bottlenecks  in  the  parallel  implementation,  possible  drawbacks  due  to  a  lack  of  reliabil¬ 
ity,  and  the  advantage  of  using  this  parallel  algorithm  in  problems  with  a  particular 
structure. 

We  used  MINPACK  on  the  Encore/Multimax  to  solve  the  same  set  of  problems.  We 
must  emphasize  that  MINPACK  was  successful  in  all  fourteen  problems  with  all  the  dif¬ 
ferent  dimensions  except  number  11.  In  MINPACK  we  used  the  double  precision  version 
of  HYBRD  which  solves  systems  of  nonlinear  equations  by  a  modification  of  Powell’s 
hybrid  method.  In  this  subroutine  the  Jacobian  is  approximated  by  a  forward-difference 
approximation.  For  all  problems  we  used  the  same  initial  points  as  presented  in  Garbow, 
Hillstrom  and  More  [9j.  The  tolerance  for  convergence  was  set  at  10~*  and  it  was  used  to 
check  the  stopping  criteria.  We  stopped  if  either  the  maximum  number  of  iterations  (100) 
was  attained,  or  if  the  relative  error  between  two  consecutive  iterates  is  less  or  equal  than 
the  tolerance. 

Let  us  now  focus  our  attention  on  the  implementation  of  the  parallel  algorithm  and 
the  numerical  results  obtained  in  the  set  of  problems  mentioned  above.  We  implemented 
the  nonlinear  block  Jacobi  algorithm  as  presented  in  Section  2.  Partition  x  as 
x=(x\  .  .  .  ,x'"),  with  and  group  correspondingly,  the  components  /,■  of  F  into 

mappings  F.rR"— *•  for  i=l,...,m.  Then  solve 

f, <(!')* . . (x”)‘)-0.  i-1 m  (4.1) 

for  [z).  This  requires  the  solution  of  m  nonlinear  systems  of  dimension  2=l,...,m  at 
each  iteration.  Each  of  these  subsystems  is  solved  by  a  different  processor  using  the  same 
subroutine  from  MINPACK  as  mentioned  above.  As  we  pointed  out  in  Section  3  in  order 
not  to  have  idle  processors  during  the  computation  a  predetermined  number  of  inner 
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steps  are  allowed  in  each  call  to  MINPACK.  We  did  runs  with  1,  5,  and  10  inner  steps. 
We  stopped  solving  the  sub-problem  if  either  the  tolerance  was  achieved  or  if  the  max¬ 
imum  number  of  inner  steps  was  performed.  The  dimensions  of  the  subproblems  were 
equal;  however,  one  can  use  different  dimensions  for  different  subproblems. 

For  a  given  dimension  of  a  problem  we  experimented  with  different  partitions.  For 
instance,  for  a  problem  of  dimension  32  we  ran  5  different  partitions:  with  2  blocks  of 
dimension  16;  with  4  blocks  of  dimension  8;  with  8  blocks  of  dimension  4;  with  16  blocks 
of  dimension  2;  with  32  one-dimensional  blocks.  For  each  given  partition  we  used  one 
processor  per  block. 

One  additional  advantage  of  the  parallel  algorithm  is  that  function  evaluations  are 
implicitly  performed  in  parallel.  This  is  because  when  solving  (4.1)  only  the  functions 
involved  in  this  group  need  to  be  evaluated.  Therefore,  great  savings  in  time  are  obtained 
over  the  serial  algorithm  and  over  MINPACK. 

For  timing  the  experiments  we  used  the  FORTRAN  function  etime  (UNIX™) 
which  returns  elapsed  runtime  in  seconds.  It  has  an  array  of  two  elements  as  argument. 
In  the  first  element  it  returns  user  time  and  in  the  second  element  it  returns  system  time. 
In  all  the  experiments  we  performed  we  only  used  the  first  element:  the  user  time. 

Let  us  now  discuss  the  numerical  results.  The  entire  set  of  numerical  results  can  be 
found  in  [8j.  We  present  here  a  sample  of  the  most  interesting  problems.  Each  table 
shows  the  following  columns: 
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NPRO  number  of  processors 

NBLK  number  of  blocks  used  in  the  partition  on  subproblems 
NSTP  number  of  inner  steps  used  to  solve  the  subproblem 
NFEV  number  of  functions  evaluations 
INFO  stopping  message 

TIME  elapsed  runtime.  System  time  is  not  accounted  for. 

5p  speedup  over  the  serial  algorithm 

S'j,  speedup  over  MINPACK 

Ep  efficiency  of  the  parallel  implementation. 

The  number  of  processors  in  NPRO  can  vary  from  1  (serial  algorithm  running)  to  20 
(maximum  number  of  processors  available  on  the  Encore/Multimax).  MINPACK  results 
are  always  located  in  the  first  row  of  each  table:  NPRO  is  one  and  NBLK  is  one.  When 
NPRO  is  equal  to  one  and  iNBLK  is  different  from  one  we  are  using  the  serial  nonlinear 
Jacobi  algorithm.  The  number  of  inner  steps  NSTP  can  be  either  1,  or  5,  or  10.  We  only 
show  the  optimal  case  in  the  tables.  NSTP=  1  is  often  enough  to  get  good  and  fast  con¬ 
vergence,  although  for  some  problems  more  than  one  inner  step  was  necessary.  The 
number  of  function  evaluations  are  calculated  by  counting  each  function  evaluation  /,. 
Thus,  for  MINTACK  we  multiplied  the  number  of  function  evaluations  by  n  (dimension 
of  the  problem).  There  are  several  stopping  messages  in  MII^ACK;  however,  the  only 
two  we  came  across  were  1  for  a  successful  run  and  4  when  the  iteration  is  not  making 
good  progress.  We  use  INFO=2  whenever  the  maximum  number  of  iterations  is  attained. 
All  timings  are  given  in  seconds  in  TIME.  For  any  given  parallel  algorithm  there  are 
three  numbers  which  give  an  idea  on  how  well  the  parallel  algorithm  is  performing.  The 
speedup  Sp  is  defined  by 
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_ running  time  for  the  serial  algorithm _ 

running  time  for  the  parallel  algorithm  using  p  processors 


where  the  serial  algorithm  is  the  nonlinear  Jacobi  method  using  different  block-partitions 
and  the  parallel  algorithm  is  the  parallel  version  of  the  Jacobi  algorithm  using  the  same 
block-partition.  Obviously  this  number  can  never  be  bigger  than  the  number  of  proces¬ 
sors  used  in  the  computation.  In  order  to  know  how  much  faster  is  the  parallel  algorithm 
vis  a  vis  the  best  serial  algorithm  we  calculate  which  is  defined  by 

_  _ running  time  of  the  best  serial  algorithm _ 

^  running  time  of  the  parallel  algorithm  using  p  processors 


where  the  best  serial  algorithm  is  MINPACK.  Whenever  in  this  column  we  find  a  zero  it 
means  that  MINPACK  was  faster  than  the  current  combination  of  processor,  blocks  and 
steps.  Whenever  we  find  in  this  column  oo  it  means  that  MINPACK  failed  to  converge 
and  the  parallel  algorithm  was  successful.  This  only  occurs  in  problem  11. 


In  order  to  know  the  efficiency  of  our  parallel  implementation  we  calculate  Ep,  the 
efficiency  of  the  algorithm,  defined  by 


^  number  of  processors  p 

This  number  can  never  be  bigger  than  one.  Furthermore,  the  numbers  in  the  last  three 
columns  of  tables  I  through  III  have  been  rounded  to  two  decimal  places. 

On  table  FV  we  have  summarized  all  the  results  of  the  parallel  Jacobi  algorithm  and 
the  corresponding  results  from  MINPACK.  For  each  given  problem  we  included  the  com¬ 
bination  of  processors,  number  of  blocks  and  steps  that  gave  the  best  timing.  In  column 
VIPACK  we  present  the  MINPACK  timing  and  in  JAC-P  we  present  the  timing  for  the 
parallel  algorithm.  In  the  last  column  we  present  the  number  Sp.  In  table  V  we  present 
the  analogous  data  for  the  serial  nonlinear  Jacobi  algorithm.  Columns  JAC-S  and  J.\C- 
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P  represent  the  timings  for  the  serial  and  the  parallel  algorithms  respectively.  The  last 
column  corresponds  to  the  speedup  obtained. 


Problem:  9  Dimension:  8 


NPRO 

NBLK 

NSTP 

NFEV 

INFO 

TIME 

4 

1 

1 

112 

1 

0.400 

1 

2 

1 

3424 

1 

5.400 

1 

4 

1 

4168 

1 

8.150 

— 

1 

8 

1 

6184 

1 

17.300 

2 

2 

1 

3424 

1 

2.867 

1.88 

0. 

0.94 

2 

4 

1 

4168 

1 

4.350 

1.87 

0. 

0.94 

2 

8 

1 

6184 

1 

9.333 

1.85 

0. 

0.93 

4 

4 

1 

4168 

1 

2.483 

3.28 

0. 

0.82 

4 

8 

1 

6184 

1 

5.200 

3.33 

0. 

0.83 

8 

8 

1 

6184 

1 

3.267 

5.30 

0. 

0.66 

Table  I 
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Problem:  10  Dimension:  32 


NPRO 

NBLK 

NSTP 

NFEV 

INFO 

TIME 

5; 

1 

1 

1216 

1 

12.783 

— 

1 

2 

1 

6112 

1 

41.450 

— 

1 

4 

1 

3552 

1 

22.317 

• 

1 

8 

1 

2048 

1 

12.867 

1  . 

16 

1 

1632 

1 

10.967 

• 

— 

1 

32 

1 

1312 

1 

9.883 

2 

2 

1 

6112 

1 

21.017 

1.97 

0. 

0.99 

2 

4 

1 

3552 

1 

11.233 

1.99 

1.14 

0.99 

2 

8 

1 

2048 

1 

6.450 

1.99 

1.98 

1.00 

2 

16 

1 

1632 

1 

5.500 

1.99 

2.32 

1.00 

2 

32 

1 

1312 

1 

5.033 

1.96 

2.54 

0.98 

4 

4 

1 

3552 

1 

5.783 

3.86 

2.21 

0.96 

4 

8 

1 

2048 

1 

3.317 

3.88 

3.85 

0.97 

4 

16 

1 

1632 

1 

2.833 

4.51 

4 

32 

1 

1312 

1 

2.617 

3.78 

4.88 

0.94 

8 

8 

1 

2048 

1 

1.750 

7.35 

7.30 

0.92 

8 

16 

1 

1632 

1 

1.500 

7.31 

8.52 

0.91 

8 

32 

1 

1312 

1  i 

1.400 

7.06 

9.13 

0.88 

16 

16 

1 

1632 

1 

0.917 

11.96 

13.94 

0.75 

16 

32 

1 

1312 

1 

1 

0.833 

11.86 

15.35 

0.74 

20 

32 

1 

1312 

1 

1.117 

8.85 

11.44 

0.44 

Table  n 
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Problem:  14  Dimension:  32 


NPRO 

NBLK 

NSTP 

NFEV 

INFO 

TIME 

4 

1 

1 

1632 

1 

16.850 

1 

2 

1 

4896 

1 

10.250 

_ 

1 

4 

1 

2848 

1 

4.783 

1 

8 

1 

2048 

1 

3.417 

1 

16 

1 

1632 

1 

3.383 

1 

1 

1440 

1 

4.400 

_ 

_ 

— 

2 

2 

1 

4896 

1 

5.183 

1,98 

3.25 

0.99 

2 

4 

1 

2848 

1 

2.417 

1.98 

6.97 

0.99 

2 

8 

1 

2048 

1 

1.767 

1.93 

9.54 

0.97 

2 

16 

1 

1632 

1 

1.750 

1.93 

9.63 

0.97 

2 

.32 

1 

1440 

1 

2.317 

1.90 

7.27 

0.95 

4 

4 

1 

2848 

1 

1.283 

3.73 

13.13 

0.93 

4 

8 

1 

2048 

1 

0.950 

3.60 

17.74 

0.90 

4 

16 

1 

1632 

1 

0.950 

3.56 

17.74 

0.89 

4 

32 

1 

1440 

1 

1.233 

3.57 

13.67 

0.89 

8 

8 

1 

2048 

1 

0.567 

6.03 

29.72 

0.75 

8 

16 

1 

1632 

1 

0.567 

5.97 

29.72 

0.75 

8 

32 

1 

1440 

1 

0.700 

6.29 

24.07 

0.79 

16 

16 

1 

1632 

1 

0.367 

9. -22 

45.91 

0.58 

16 

32 

1 

1440 

1 

0.467 

BK9I 

36.08 

0.59 

•20 

32 

1 

1440 

1 

0.850 

5.18 

19.82 

0.26 

Table  HI 
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MINPACK  Timing  Table 


PROB 

DIM 

NPRO 

NBLK 

NSTP 

MPACK 

JAC-P 

5; 

1 

2 

2 

2 

5 

0.267 

0.033 

8.09 

2 

4 

4 

4 

1 

1.100 

0.576 

1.94 

3 

2 

2 

2 

1 

2.317 

0.25 

9.27 

4 

4 

4 

4 

10 

2.333 

0.317 

7.36 

5 

4 

2 

2 

1 

0.783 

0.267 

2.93 

8 

4 

2 

2 

1 

0.267 

1.750 

0. 

9 

4 

2 

2 

1 

0.133 

0.783 

0. 

9 

8 

4 

4 

1 

0.400 

2.483 

0. 

9 

16 

4 

4 

1 

1.483 

9.633 

0. 

9 

32 

2 

2 

1 

6.483 

10 

4 

4 

4 

1 

0.167 

0.167 

1. 

10 

8 

8 

8 

1 

0.533 

0.217 

2.46 

10 

16 

16 

16 

1 

2.300 

0.35 

6.57 

10 

32 

16 

32 

1 

12.783 

0.833 

15.35 

11 

4 

2 

2 

1 

0.717 

00 

11 

8 

8 

8 

1 

0.717 

00 

11 

16 

16 

16 

1 

0.783 

CXJ 

11 

32 

16 

32 

1 

2.000 

00 

13 

4 

4 

4 

1 

0.267 

0.317 

0. 

13 

8 

4 

4 

1 

0.750 

0.367 

2.04 

13 

16 

8 

8 

1 

2.767 

0.450 

6.15 

13 

32 

16 

16 

1 

10.967 

0.617 

17.77 

14 

4 

4 

4 

1 

0.583 

0.117 

14 

8 

8 

8 

1 

1.45 

0.150 

14 

16 

8 

8 

1 

4.700 

0.250 

18.80 

14 

32 

16 

16 

1 

16.85 

0.367 

45.91 

Table  IV 
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Jacobi  Timing  Table 


PROB 

DIM 

NPRO 

NBLK 

NSTR 

JAC-S 

JAC-P 

1 

0 

2 

2 

5 

0.05 

0.033 

1.52 

2 

4 

4 

4 

1 

1.7 

0.567 

3.0 

3 

2 

2 

2 

1 

0.383 

0.250 

1.53 

4 

4 

4 

4 

5 

1.017 

0.317 

3.21 

5 

4 

2 

2 

1 

0.35 

0.267 

1.31 

8 

4 

4 

4 

1 

5.667 

1.867 

3.04 

9 

4 

4 

4 

1 

2.817 

0.883 

3.19 

9 

8 

8 

8 

1 

17.300 

3.267 

5.30 

9 

16 

4 

4 

1 

32.567 

9.633 

3.38 

9 

'  32 

10 

4 

4 

4 

1 

0.517 

0.167 

3.10 

10 

8 

8 

8 

1 

1.100 

0.217 

5.07 

10 

16 

16 

16 

1 

3.267 

0.350 

9.33 

10 

32 

16 

16 

1 

10.967 

0.917 

11  96 

11 

4 

2 

2 

1 

1.350 

0.717 

1.88 

11 

8 

8 

8 

1 

4.250 

0.717 

5.93 

11 

16 

16 

16 

1 

8.350 

0.783 

10.66 

11 

32 

16 

32 

1 

25.417 

2.000 

12.71 

13 

4 

4 

4 

1 

0.933 

0.317 

2.94 

13 

8 

8 

8 

1 

2.05 

0.417 

4.92 

13 

16 

8 

16 

1 

4.400 

0.617 

7.13 

13 

32 

16 

32 

1 

10.083 

1.117 

9.03 

14 

4 

4 

4 

1 

0.350 

0.117 

2.99 

14 

8 

8 

8 

1 

0.700 

0.150 

4.67 

14 

16 

16 

16 

1 

1.850 

0.283 

6.54 

14 

32 

16 

32 

1 

4.4 

0.467 

9.42 

Table  V 


We  have  added  several  figures  to  illustrate  the  behavior  of  the  parallel  algorithm. 
The  Figures  are  in  logarithmic  scale.  In  Figure  1,  we  notice  that  no  matter  how  many 
processors  we  use  to  solve  Problem  9.  n=8,  MINPACK  is  always  faster  than  the  parallel 
algorithm.  In  Figure  2,  Problem  10  with  dimension  16,  we  notice  that  MINPACK  is  fas¬ 
ter  than  the  serial  algorithm  but  using  two  processors  the  parallel  Jacobi  becomes  faster 
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than  MINPACK.  In  Figures  3  and  4,  a  common  paradigm  in  parallel  computation  occurs. 
Once  one  starts  using  too  many  processors  on  a  problem,  one  starts  running  slower.  As 
we  see  in  Figures  3  and  4  the  optimal  number  of  processors  seems  to  be  around  15  on 
Problems  13  and  14  with  n=32;  already  using  16  processors  represents  a  lost  in  speedup. 
In  Figure  5,  we  present  for  Problem  10  with  n=32,  the  timings  for  all  the  different  parti¬ 
tions.  We  notice  that  using  16  or  32  block  partitions  we  get  almost  identical  results.  In 
all  figures  we  notice  the  linear  speedup  characteristic  of  a  parallel  algorithm. 


Prob  9  Dim  8  Nblocks  8 


time(secs) 


Figure  1.  For  this  problem  the  parallel  Jacobi  algorithm  is  nerer  Taster  than  MINPACK.  The  higher  speedup 
Is  5.3  using  8  processors. 
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Prob  10  Dim  lA  Nblocks  16 


time(secs) 


Figure  2.  For  this  problem  the  parallel  Jacobi  algorithm  is  S.S7  times  faster  than  MINPACK.  The  higher 
speedup  is  9.33  using  16  processors. 


Prob  13  Dim  32  Nblocks  32 


time(sec8) 


Figure  3.  For  this  problem  the  parallel  Jacobi  algorithm  is  9.82  times  faster  than  MINPACK.  The  higher 
speedup  is  9.03  using  16  processors. 
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Prob  14  Dim  32  Nblocks  32 


time(secs) 


Figure  4.  For  this  problem  the  p&rallel  Jacobi  algorithm  is  36.08  times  faster  than  MINPACK.  The  higher 
speedup  is  0.42  using  16  processors. 


Prob  10  Dim  32 


time{secs) 


Figure  3.  We  are  using  different  set  of  block  partitions  with  different  number  of  processors. 
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We  were  able  to  run  10  problems  out  of  the  standard  14  problems  for  nonlinear 
equations  in  [9].  We  can  say  that  the  nonlinear  Jacobi  algorithm  is  71%  reliable  on  this 
set  of  test  problems.  In  fact,  this  is  the  main  disadvantage  of  the  algorithm.  On  the  other 
hand,  in  most  cases  whenever  the  method  converges,  the  parallel  implementation  outper¬ 
forms  MINPACK.  As  we  pointed  out  earlier  this  is  partly  due  to  the  fact  that  function 
evaluations  on  this  set  of  test  problems  are  not  expensive.  As  we  can  see  in  table  4  only 
in  problem  9  does  MINPACK  outperform  the  parallel  algorithm.  The  parallel  algorithm 
is  considerably  faster  than  MINPACK;  in  particular,  as  the  dimension  increases  the 
parallel  algorithm  seems  to  work  better.  The  outstanding  performance  of  the  parallel 
Jacobi  algorithm  on  problems  13  and  14  is  due  to  the  particular  structure  of  the  Jaco¬ 
bian,  tridiagonal  and  banded  respectively.  The  nonlinear  Jacobi  performs  extremely  well 
on  problems  whose  Jacobians  have  a  particular  structure  centered  around  the  diagonal, 
tridiagonal  or  banded  Jacobians.  Nevertheless,  the  performance  in  general  is  quite 
interesting  and  extremely  promising.  On  this  set  of  problems  we  obtained  on  average  an 
speedup  of  10.  The  speedups  in  table  5  show  the  considerable  improvement  that  the 
parallel  implementation  produced  over  the  standard  serial  algorithm.  We  obtain  a  high  of 
11.96  using  16  processors  in  problem  10  with  dimension  32. 

In  only  two  problems  we  set  the  parameter  «;*,  as  defined  in  (2.4),  to  a  value  dif¬ 
ferent  from  1.0  in  order  to  get  faster  convergence.  On  Problem  8,  and  on  Prob¬ 

lem  10,  u;jt=0.9.  In  table  6  we  present  the  results  of  problem  10  with  dimension  32  using 
the  standard  value  «;*=1.0.  We  notice  there  is  a  considerable  gain  on  speedup  for  a  small 
change  on  this  parameter. 
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Problem:  10  Dimension:  32 


NPRO 

NBLK 

NSTP 

NFEV 

INFO 

TIME 

5, 

5; 

■  E, 

1 

1 

« 

1216 

1 

12.783 

1 

2 

1 

5504 

1 

37.683 

1 

4 

1 

3964 

1 

24.683 

1 

8 

1 

2944 

1 

18.450 

1 

16 

1 

2272 

1 

15.050 

• 

• 

— 

1 

32 

1 

1952 

1 

14.817 

2 

2 

1 

5504 

1 

18.967 

1.99 

0. 

0.99 

2 

4 

1 

3904 

1 

12.383 

1.99 

1.03 

1.00 

2 

8 

1 

2944 

1 

9.300 

1.98 

1.37 

0.99 

2 

16 

1 

2272 

1 

7.650 

1.97 

1.67 

0.98 

2 

32 

1 

1952 

1 

7.567 

1.96 

1.69 

0.98 

4 

4 

1 

3904 

1 

6.383 

3.87 

2.00 

0.97 

4 

8 

1 

2944 

1 

4.783 

3.86 

2.67 

0.96 

4 

16 

1 

2272 

1 

3.950 

3.81 

3.24 

0.95 

4 

32 

1 

1952 

1 

3.917 

3.78 

3.26 

0.95 

8 

8 

1 

2944 

1 

2.550 

7.24 

5.01 

0.90 

8 

16 

1 

2272 

1 

2.117 

7.11 

6.04 

0.89 

8 

32 

1 

1952 

1 

2.133 

6.95 

5.99 

0.87 

16 

16 

1 

2272 

1 

1.217 

12.37 

10.50 

0.77 

16 

32 

1 

1952 

1 

1.217 

12.18 

10.50 

0.76 

20 

32 

1 

1952 

1 

1.283 

11.55 

9.96 

0.58 

Table  6 


Additional  experiments. 

We  try  to  make  the  method  more  robust  and  reliable  by  using  multi-splitting  tech¬ 
niques  such  as  the  ones  developed  by  O’Leary  and  White  (15j  for  linear  systems.  The 
main  idea  behind  this  approach  is  to  be  able  to  use  more  information  from  the  Jacobian 
at  each  iteration  by  using  more  processors  to  perform  additional  computations. 

One  idea  is  to  use  a  card-dealer  technique  to  assign  each  function  /,  to  each  proces¬ 
sor.  In  this  way.  we  will  be  able  to  use  part  of  the  Jacobian  matrix  which  lies  outside  the 
diagonal.  This  particular  block-  partition  could  be  used  concurrently  with  other  block- 


942 


A  parallel  algorithm  for  nonlinear  equations 


partitions.  Such  partitions  could  be  made  of  blocks  with  different  dimensions.  At  each 
iteration  we  get  several  solution  vectors  depending  on  the  number  of  partitions  we  have 
used.  In  order  to  use  all  this  information  we  take  as  our  next  iterate  a  convex  combina¬ 
tion  of  all  the  solution  vectors  at  each  iteration.  In  the  following  figure  we  show  two  dif¬ 
ferent  partitions  running  concurrently  on  a  Jacobian  of  dimension  six  using  three  proces¬ 
sors  for  each  partition.  Let  us  say  that  at  the  kth  iteration  partition  one  gives  the  solu¬ 
tion  vector  X*  and  partition  two  gives  x^,  then  our  next  iterate  x*'*’^  =  ^*72  -h  ^2/2. 


1  ’  1 

. 1 . 

1  1 


■"2  """2 


2 


3 '3 

sTz 


Three  processors  Three  processors 

Figure  8.  Different  partitions  running  concurrently  at  each  iteration. 


We  decided  to  test  this  idea  on  problem  9  with  dimension  16.  We  decided  to  use  a 
standard  partition  of  4  blocks  each  of  dimension  4  and  a  second  partition  of  4  blocks 
with  dimensions  5,  5,  5  and  1. 
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Problem:  9  Dimension:  16 


NPRO 

NBLK 

NSTP 

NFEV 

INFO 

TIME 

1 

8 

1 

14576 

1 

47.183 

— 

2 

8 

1 

14576 

1 

27.083 

1.74 

0. 

0.87 

4 

8 

1 

14576 

1 

14.233 

3.32 

0. 

0.83 

8 

8 

1 

14576 

1 

9.017 

5.23 

0. 

0.65 

Table  7 


The  method  is  still  slower  than  MINPACK.  However,  if  we  compare  the  results  of 
table  7  above  and  table  9  in  [8]  we  notice  that  we  succeed  in  getting  convergence  using  8 
processors  and  the  method  performs  faster  than  the  best  case  in  [8]. 

We  have  also  experimented  with  the  following  idea.  Every  other  iteration  we  use  a 
different  partition  with  the  same  number  of  processors.  We  may  also  choose  to  do  this 
every  two  or  five  iterations.  One  of  the  partitions  may  use  only  one  block  in  which  case 
we  will  be  doing  a  MINPACK  step  every  other  iteration.  We  tried  out  this  idea  on  prob¬ 
lem  9  with  dimension  8.  One  partition  has  8  blocks  with  dimension  1  each,  the  other  par¬ 
tition  has  2  blocks  with  dimensions  7  and  1.  We  change  partitions  every  five  iterations. 
The  results  follow. 


Problem:  9  Dimension:  8 


NPRO 

NBLK 

NSTP 

NFEV 

INFO 

TIME 

5p 

5; 

1 

8 

1 

2255 

1 

4.700 

2 

8 

1 

2255 

1 

2.967 

1.58 

0. 

0.79 

4 

8 

1 

2255 

1 

2.200 

2.14 

0. 

0.53 

8 

8 

1 

2255 

1 

1.800 

2.61 

0. 

0.33 

Table  8 


We  also  tried  a  partition  of  8  blocks  with  dimension  1  each  and  a  partition  of  one 
block  with  dimension  8.  Hence,  we  are  doing  a  MINPACK  step  every  five  iterations. 
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Problem:  9  Dimension:  8 


NPRO 

NBLK 

NSTP 

NFEV 

INFO 

TIME 

5; 

Ep 

1 

8 

1 

702 

1 

1.25 

2 

8 

1 

702 

1 

0.867 

1.44 

0. 

0.72 

4 

8 

1 

702 

1 

0.683 

1.83 

0. 

0.46 

8 

8 

1 

702 

1 

0.583 

2.14 

0. 

0.27 

Table  9 


With  this  technique  we  are  able  to  perform  more  than  4  times  faster  than  using  the 
standard  partition  procedure  (see  [8]).  Although  MINPACK  is  still  faster  (0.400  secs)  we 
have  decreased  considerably  the  execution  time.  It  is  interesting  to  note  in  tables  8  and  9 
that  the  speedups  5p  are  not  as  big  as  in  [8].  This  was  predictable  since  every  five  itera¬ 
tions  we  can  have  up  to  seven  processors  idling. 

9.  Conclusions  and  future  work.  The  parallel  implementation  of  the  block  non¬ 
linear  Jacobi  algorithm  has  given  us  better  results  than  we  expected.  It  has  given  us  a 
way  to  solve  systems  of  nonlinear  equations  in  parallel.  To  our  knowledge  this  is  the  first 
time  a  parallel  algorithm  for  solving  this  type  of  problems  has  had  such  a  performance  in 
a  real  parallel  computer. 

It  is  interesting  to  notice  that  the  main  idea  behind  the  algorithm  is  the  fundamen¬ 
tal  idea  behind  some  powerful  parallel  algorithms,  namely,  divide  and  conquer.  Further¬ 
more,  it  is  worth  noticing  that  although  the  algorithm  is  only  linearly  convergent  it  per¬ 
forms  faster  than  a  quadratically  convergent  algorithm  on  certain  problems  with  a  partic¬ 
ular  structure  and  on  certain  other  problems  where  the  function  evaluations  are  not 
expensive. 
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The  main  disadvantage  of  the  algorithm  is  its  lack  of  reliability.  With  only  a  71% 
of  success  rate  it  cannot  be  thought  as  a  way  of  solving  nonlinear  equations.  However, 
there  are  several  ways  of  improving  the  convergence  of  the  method  at  no  additional  coat. 
Along  these  lines  some  preliminary  tests  were  presented  at  the  end  of  the  previous  sec¬ 
tion.  Other  approaches  are  currently  being  tested  and  will  be  part  of  a  future  report. 

Some  more  testing  is  certainly  necessary.  In  particular,  we  will  try  to  study  the 
behavior  of  the  algorithm  using  initial  points  that  are  farther  away  from  the  solution.  We 
will  also  implement  the  algorithm  to  solve  unconstrained  minimization  problems. 

One  of  the  advantages  of  using  the  Monitors  macros  is  that  they  are  portable  and 
therefore,  the  code  which  is  running  on  the  Encore/  Multimax  will  run  on  any  other 
parallel  computer  where  the  macros  have  been  installed.  This  is  the  case,  for  instance,  for 
the  Alliant  FX/8  located  at  Argonne  National  Laboratory.  The  Alliant  is  a  machine 
which  is  more  suitable  for  numerical  computations  because  it  allows  one  to  use  con¬ 
currency  and  vectorization  in  each  processor.  We  decided  to  start  our  experiments  on  the 
Encore/Multimax  because  this  machine  has  20  processors  in  contrast  with  the  Alliant 
which  has  only  8  processors.  The  numerical  results  of  these  experiments  on  the  Alliant 
will  appear  on  a  forthcoming  report. 
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1,0  .ABSTR.\CT 

This  paper  discusses  the  work  jointly  performed  by  the  U.S.  Army  Material  Systems 
Analysis  Activity,  the  Computational  Engineering  Company  and  the  Ballistic  Research 
Laboratory  in  the  application  of  time  series  analysis  and  modern  control  theory  to 
characterize  armored  vehicle  weapon  tube  flexure.  The  motivation  for  performing  this 
work  stems  from  th^  fact  that  gun  bend  related  errors  have  a  significant  affect  on  fire- 
on-the-move  delivery  accuracy.  Hence  the  ability  to  predict  the  precise  location  of  the 
weapon’s  muzzle  as  a  function  of  time  in  terms  of  it’s  past  and  current  history  as  well  as 
other  sensor  measurements  could  significantly  enhance  weapon  system  accuracy. 

Previous  efforts  to  develop  muzzle  flexure  prediction  algorithms  have  generally 
relied  on  purely  deterministic  techniques.  That  is  gun  flexure  was  mathematically 
characterized  by  deterministic  differential  equations  that  were  a  function  of  such  param¬ 
eters  as  weapon  angle  position,  rate  and  acceleration,  linear  acceleration,  and  bending  of 
the  gun  tube.  In  the  case  of  gun  dynamics  this  approach  tended  to  be  unsuitable  for 
practical  implementation  because; 

o  mathematically  they  may  be  extremely  complicated, 

o  they  do  not  account  for  modeling  and  measurement  uncertainties,  and 

o  they  lack  the  robustness  of  being  adaptive. 

This  study  discusses  the  preliminary  work  that  has  been  performed  to  develop  practical 
algorithms  that  address  the  above  problems.  The  overall  approach  was  to: 
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o  apply  time  series  analysis  techniques  to  strain  gage  and  other  test  data 
obtained  from  the  Ml  Combat  Tank  mounting  a  105mm  weapon  system  and  tested  over 
a  special  Aberdeen  Proving  Ground  Test  Course, 

o  develop  auto-regression/moving  average  (AR/MA)  models  of  the  test  data  to 
characterize  dynamic  weapon  flexure,  and 

o  convert  the  AR/MA  models  to  adaptive  Kalman  filter  prediction  algorithms. 

The  paper  concludes  with  a  discussion  of  future  modeling  and  field  testing  necessary 
to  refine  the  existing  Kalman  filter/predictor  algorithms  and  to  incorporated  a  physical 
model  into  the  filter  structure. 

2.0  INTRODUCTION 

Since  gun  barrel  bending  contributes  significantly  to  the  total  projectile  error 
budget,  efforts  to  predict  the  precise  location  of  the  gun  muzzle  as  a  function  of  time  in 
terms  of  its  past  history  could  significantly  improve  the  accuracy  of  the  weapon  system. 
Previous  efforts  by  the  U.S.  Army  Ballistic  Research  Lab  (BRL)  to  develop  precision  aim 
techniques  (PAT)  have  used  a  deterministic  approach.  Specifically,  the  gun  motion  was 
assumed  to  be  described  as  a  function  of  gun  turret  angles,  angular  rates  and  accelera¬ 
tions,  tank  vertical  acceleration  and  bending  (and  bending  rates)  of  the  gun  tube.  The 
differential  equation  used  to  predict  the  position  of  the  gun  muzzle  at  projectile  exit  was 
derived  using  simple  geometry  and  the  equation  for  the  fundamental  bending  mode. 
Although  the  deterministic  approach  is  promising,  it  has  not  performed  well  in  field  tests 
at  longer  (e.g.,  20  milliseconds)  in-bore  times. 

An  alternate  approach  is  to  use  only  strain  gauge  (gun  tube  deflection)  and  servo 
error  data  and  model  the  gun  deflection  as  a  Markov  process:  a  linear  system  driven  by 
white  noise.  It  is  this  stochastic  approach  which  was  investigated  here. 

In  this  paper  we  describe  an  adaptive  model  identification  algorithm  for  predicting 
gun  deflection  as  the  projectile  leaves  the  tube.  The  adaptability  is  necessary  because  of 
the  potential  great  variability  in  the  gun  motion  due  to  tank  velocity  or  variation  in  ter¬ 
rain  (e.g.,  surfaced  road  to  rough  ground).  Efficient  operation  is  desirable  as  the  algo¬ 
rithm  could  possibly  serve  as  the  basis  of  a  real-time  gun  inhibit  algorithm. 

The  paper  is  in  six  sections.  In  Section  3,  a  discussion  of  the  data  utilized  for 
modeling  is  given.  In  Section  4,  the  technical  approach  is  outlined  and  the  method  used 
for  evaluating  the  algorithm  is  described  in  Section  5.  The  paper  is  concluded  in  Section 
6  by  a  brief  discussion  and  suggestions  for  future  investigations. 

3.0  TEST  DATA 

The  available  data  was  obtained  from  strain  gauge  and  digital  control  transformer 
(DCT)  sensors  mounted  on  the  gun  tube  of  a  heavy  tank  in  wide  use  by  the  Army.  The 
strain  gauge  measured  the  gun  tube  deflection  while  the  DCT  measured  the  angle  of  the 
gun  tube  with  respect  to  the  turret  (see  Figure  1).  Both  sensors  were  sampled  at  250  Hz. 


952 


criterion  of  minimizing  the  prediction  error.  A  good  summary  of  ARMA  modeling  tech¬ 
niques  can  be  found  in  [l]. 

4.1  Data  Analysis 

The  data  was  first  carefully  examined  to  determine  general  characteristics  and  to 
identify  statistically  (locally)  stationary  segments.  By  comparing  the  data  to  a  schematic 
of  the  bump  course,  segments  representative  of  various  physical  situations  or  environ¬ 
ments  could  be  selected.  These  segments  provided  the  means  to  investigate  the  spectral 
content  as  well  as  evaluate  the  eventual  design  (see  Section  4). 

Computation  of  data  power  spectra  using  the  periodogram  and  the  maximum 
entropy  method  (MEM)  was  performed  to  identify  the  dominant  spectral  bands  and  the 
bandwidth  of  these  spectra.  This  analysis  proved  useful  in  relating  the  observed  spectral 
content  to  the  physical  effects  as  well  as  iu  the  determination  of  the  appropriate  model 
and  approximate  model  order.  Further,  an  important  conclusion  based  on  the  spectral 
analysis  is  that  tank  speed  had  little  effect  on  observed  spectral  frequencies.  An  estimate 
of  the  power  spectrum  of  the  muzzle  error  using  MEM  for  a  segment  consisting  primarily 
of  small  bumps  is  given  Figure  2. 

4.2  Identification 

It  is  assumed  that  an  ARMA  (p,q)  model  is  sufficiently  general  to  model  both  the 
DCT  and  strain  gauge  data.  In  equation  (l),  a  ,  i=l,2,...,p  and  b  ,  i=l,2,...,q  denote 
respectively  the  autoregressive  (AR)  and  moving  average  (MA)  coefficients,  p  and  q 
are  the  AR  and  MA  orders,  and  w(k)  is  a  zero  mean,  unit  variance  Gaussian  white  noise 
sequence. 


p  ? 

2{k)  =  —  a-z{k—i)  4-  ^  b-w{k—t)  -f-  bgW{k) 

1-1  «-i 


(1) 


The  use  of  the  model  for  prediction  therefore  initially  requires  estimation  of  the 
orders  and  coefficients.  The  autoregressive  order  (p)  was  estimated  using  a  technique 
due  to  Cadzow  [2]  based  on  determining  the  effective  rank  of  an  associated  overdeter¬ 
mined  ARM.\  autocorrelation  matrix.  (The  term  overdetermined  refers  to  AR  and  MA 
orders  selected  for  estimation  which  are  much  larger  than  the  true  unknown  orders.)  .4s 
far  as  could  be  determined,  no  similar  technique  is  available  for  estimating  the  MA  order 
and  for  this  reason,  Cadzow’s  suggestion  of  simply  setting  the  M.4.  order  equal  to  AR 
order  was  implemented. 

As  is  well  known  [3]-[4],  estimation  of  the  AR  coefficients  under  a  least  squares  cri¬ 
terion  results  in  a  linear  system  of  equations  to  be  solved.  However,  the  MA  estimation 
is  a  nonlinear  system.  For  this  reason,  the  basic  approach  to  coefficient  estimation  was 
to  approximate  the  ARMA  process  by  a  large  order  autoregression.  (Note  that  any 
ARMA  process  can  be  represented  by  an  AR  process  of  possibly  infinite  forder.)  This 
viewpoint  was  adopted  due  to  the  severe  computational  constraints.  The  technique 
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(4  milliseconds). 


TURRET 


Figure  1.  Geometry  of  Gun  Tube  Deflection 

There  were  four  tests  available  for  analysis.  Each  test  was  conducted  on  the  Profile 
rV  bump  course  at  Aberdeen  Proving  Ground  at  speeds  of  5,  15,  22,  and  30  miles  per 
hour  (one  test  at  each  speed).  The  course  consists  of  approximately  460  feet  of  triangu¬ 
lar  and  small  wooden  bumps  up  to  12  inches  high  with  gravel  lead-in  and  exit  areas. 
The  Profile  FV  course  is  considered  to  be  one  of  the  most  severe  tests  of  a  tank’s  ability 
to  point  the  gun  accurately  while  traversing  rough  terrain. 

4.0  TECHNICAL  APPROACH 

We  next  describe  the  technical  approach  implemented  for  strain  gauge,  DCT,  and 
resultant  muzzle  error  identification  and  prediction.  The  basic  approach  was  to  model 
the  data  as  a  Gauss-Markov  process.  Specifically,  based  on  the  spectral  analysis  (dis¬ 
cussed  in  4.1),  it  was  assumed  that  the  data  was  best  modeled  as  an  autoregressive  mov¬ 
ing  average  (ARMA)  model  (defined  below)  whose  parameters  are  selected  under  the 
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implemented  essentially  follows  an  approach  described  by  Graupe  et  al.  [5]  which 
involves  estimating  the  coefficients  of  a  high-order  AR  model  and  transforming  it  to  a 
lower  order  ARNIA  Model.  The  AR  coefficients  were  computed  from  MEM  utilizing 
Burg’s  algorithm  [6]. 

4.3  Prediction. 

As  discussed,  accurate  firing  of  the  tank  requires  prediction  of  the  muzzle  error  at 
some  future  time  instant.  The  length  of  prediction  step  is  dependent  on,  for  example, 
the  type  of  round,  and  the  length  of  the  gun,  etc.  This  problem  can  be  stated  more  for¬ 
mally  as  the  optimal  prediction  of  the  ARMA  process  at  step  k  -b  n  based  on  data  up 
to  step  k  . 

Because  of  its  many  desirable  features,  the  prediction  method  employed  was  the 
Kalman  filler  [7].  To  utilize  the  Kalman  filter,  it  was  first  necessary  to  convert  the 
.ARMA  process  to  state  space  form.  By  defining  the  state  equation 

x{k  -b  1)  =  A  x{k)  -b  B  iu{k)  (2) 

where 


then  the  observation  equation 

z{k)  =  [1,0,  ...,0]  x(^-)  -b  b^w{k)  (3) 

describes  the  ARMA  process.  In  the  above,  the  estimated  orders  and  coefficients  are  util¬ 
ized.  Note  that  from  (3-2)-(3-3),  the  process  noise  is  correlated  with  the  observation 
noise.  In  order  to  avoid  the  increased  complexity  incurred  in  the  correlated  noise  case, 
an  equivalent  augmented  system  was  implemented  which  removed  the  "measurement" 
noise.  In  any  case,  the  Kalman  filter  recursively  computes  the  conditional  expectation: 

x{k\k)  ==  £:[i(^)i0(O),...,z(^)] 

which  is  the  minimum  variance  estimate  (the  estimate  which  produces  the  smallest 
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variance  of  the  difference  of  the  state  and  the  estimate  based  on  the  observations).  The 
ARMA  estimate  is  immediate  from  (3-3) 


^(^*1/:)  *=  (l,0,...,0]  x(A:  |t) 


and  because  the  state  matrix  A  does  not  depend  on  time,  it  can  be  shown  that 

x{lc  +  n  |/:)  ==  a"  (4) 

The  methodology  was  applied  to  the  cases  n  =3  (12  ms)  and  n  =  5  (20  ms)  .  It  is 
important  to  emphasize  that  while  equation  4  is  optimal,  the  quality  of  the  estimate 
deteriorates  as  n  grows  large. 

4.4  Adaptive  Estimation 

Implementation  of  the  prediction  algorithm,  shown  in  Figure  3  consists  of  estimat¬ 
ing  the  ARMA  order  and  coefficients  using  data  during  a  "training"  interval  followed  by 
prediction  for  a  short  interval  following  the  training.  By  training  continuously,  the  algo¬ 
rithm  provides  an  adaptive  algorithm  for  prediction.  The  approach  was  considered  not 
only  for  its  simplicity,  but  also  for  its  (comparatively)  small  computational  burden.  An 
estimate  of  the  computational  burden  was  made  for  a  simplified  (reduced  order)  version 
of  the  algorithm  and  it  appears  that  it  can  operate  in  real-time  on  a  DEC  MicroVax  II. 
However,  to  realistically  measure  the  true  computational  requirements,  the  algorithm 
w'as  extensively  evaluated. 


Figure  3.  Adaptive  Filtering  Prediction  Method 
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60  • - •  PREDICTION  INTERVAL 


Figure  4 .  Muzzle  Pointing  Error  and  Prediction 


5.0  ->U.GORITHM  EV.ALUATION 


The  implementation  of  the  algorithm  required  selection  of  certain  "variables"  such 
as  the  number  of  autoregression  coefficients  (N)  to  use  in  the  approximation  or  the 
length  of  the  training  interval.  Due  to  the  requirement  of  computational  efficiency,  it 
was  of  interest  to  set  variables  providing  acceptable  performance  while  yielding  the  shor¬ 
test  possible  run-time.  Initially,  values  which  resulted  in  good  identification  and  predic¬ 
tion  were  chosen,  then  the  values  were  altered  in  a  systematic  manner  until  a  "minimal" 
set  was  obtained. 

5.1  Experimental  Baseline 

To  evaluate  the  performance  as  well  as  the  limitations  of  the  ARMA  approach,  the 
methodology  was  applied  to  a  variety  of  representative  data  segments  from  the  four 
tests.  The  segments  were  selected  to  provide  typical  (modeling  and  prediction  over  simi¬ 
lar  data),  as  well  as  atypical  (modeling  and  prediction  over  different  data)  conditions. 
To  be  more  precise,  by  choosing  a  variety  of  training  and  prediction  interval  combina¬ 
tions,  the  approach  was  tested  under  different  physical  "scenarios"  associated  with  the 
tank  traversing  different  portions  of  the  track.  Since  the  track  is  composed  of  regions 
consisting  of  primarily  small  bumps  or  large  bumps,  six  different  combinations  were 
identified.  For  example,  one  combination  resulting  in  a  typical  condition  is  training  and 
prediction  over  data  consisting  primarily  of  large  bumps.  An  atypical  condition  would 
result  from  training  over  large  bumps  and  prediction  over  a  segment  consisting  of  small 
bumps.  The  ability  of  the  algorithm  to  predict  the  data  for  a  typical  case  is  shown  in 
figure  4. 

To  evaluate  the  quality  of  the  muzzle  pointing  error,  the  sample  standard  deviation 
(RMS)  of  the  error  residual  sequence  e(k), 


e(^):  =  z{k)  —  z{k\k  —  n) 


was  computed.  The  error  RMS  was  computed  both  over  the  entire  segment  and  data 
and  only  over  the  zero  crossings:  the  points  at  which  the  prediction  is  within  l/lO  mil- 
liradian  band  of  zero.  This  latter  statistic  is  important,  as  only  at  the  predicted  zero 
crossing  will  the  gunner  will  be  allowed  to  fire.  For  comparison,  the  RMS  of  the  data 
over  the  entire  segment  was  also  computed. 

Results  for  the  baseline  set  of  experiments  show  that  the  algorithm  provides  an 
average  error  reduction  of  32%  for  the  typical  and  23%  for  the  atypical  segments.  The 
most  dramatic  reductions  often  occur  at  the  higher  speeds. 

5.2  Reduction  of  Algorithm  Run  Time 

Since  the  algorithm  achieved  suitable  performance  on  the  baseline  segments,  values 
of  the  algorithm  were  next  individually  varied  to  result  in  shorter  run  times. 
Specifically,  the  effect  of  reducing  the  training  interval,  the  order  of  the  autoregressive 
approximation  (N),  and  the  estimated  .AR  and  N/LA  orders  was  measured.  In  order  to 
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quantify  the  effect  of  the  various  changes,  the  run  time  of  each  subroutine  of  the  algo¬ 
rithm  was  calculated  with  a  timing  program.  Execution  time  is  most  sensitive  to  ARMA 
order  as  it  is  directly  related  to  the  Kalman  filter  computation,  often  the  most  numeri¬ 
cally  expensive  portion  of  the  algorithm.  Examples  of  results  utilizing  reduced  values 
are  given  in  Table  1.  A  typical  run  which  used  two  seconds  of  data  to  establish  the 
model  followed  by  prediction  over  one  second  of  data  required  approximately  4  to  5.5 
seconds  on  a  DEC  Micro Vax  II. 


Table  1.  Results  of  Variable  Testing  (2  sec  training) 

SPEED 

MPG 

CASE* 

RAt\'  DATA  RMS 

BEST  MODEL 
PREDICTION  ERROR 

REDUCED  ORDER 
PREDICTION  ERROR 

A 

0.3mr 

0.2mr  (15) 

0.2m4  (3) 

5 

B 

0.3 

0.2  (1) 

0.2  (1) 

C 

0.2 

0.2  (7) 

0.2  (1) 

■ 

A 

0.3 

0.3  (1) 

0.3  (1) 

H 

B 

0.7 

0.3  (4) 

0.3  (1) 

■ 

C 

0.5 

0.4  (1) 

0.4  (1) 

A 

0.6 

0.4  (17) 

0.4  (8) 

30 

B 

1.0 

0.3  (9) 

0.3  (4) 

c 

0.9 

0.4  (3) 

0.4  (3) 

‘Cases: 


A;  Train  on  Small  Bumps.  Predict  on  Small  Bumps 
B;  Train  on  Large  Bumps,  Predict  on  Large  Bumps 
C:  Train  on  Small  Bumps,  Predict  on  Large  Bumps 

6.0  DISCUSSION  AND  FUTURE  WORK 

Considering  the  limited  scope  of  this  study  (restriction  of  ARMA  models  and  lim¬ 
ited  data  types),  the  results  are  quite  encouraging.  For  the  baseline,  the  adaptive  20 
millisecond  ARIvL\  predictor  was  able  to  reduce  the  total  muzzle  pointing  error  usually 
from  20%  to  60%  for  the  expected  operational  conditions  (typical  scenarios)  and  even 
for  the  atypical  cases  there  was  often  a  small  to  moderately  large  reduction.  Usually, 
errors  are  between  0.1  and  0.4  milliradians.  By  reducing  the  training  interval  and  forci¬ 
bly  decreasing  the  order  of  the  model,  a  version  of  the  algorithm  with  comparable  per¬ 
formance  was  obtained  which  appears  capable  of  real-time  operation  on  commercially 
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available  microprocessors.  It  Is  almost  certain  that  VHSIC  technology  will  make  real¬ 
time  operation  feasible. 

Since  the  ARMA  work  was  completed  an  alternative  technique,  Canonical  Variate 
Analysis  (CVA),  was  applied  to  the  strain  gage  test  data.  The  CVA  approach  Is  a 
method  for  Identifying  the  observable  dynamics,  or  states,  from  empirical  data.  The 
algorithm  Is  automatic  and  completely  "data  driven"  no  aprlorl  modeling  Is  required.  A 
brief  overview  of  the  method  Is  provided  here. 

The  CVA  algorithm  provides  directly  from  the  data  a  state-space  representation  of 
the  underling  system  generating  the  data.  That  Is,  CVA  estimates  all  relevant  quantities 
of  a  system  of  the  form: 


x{k  -f  1)  =  T  x{k)  -b  G  u[k)  -|-  w[k)  (5) 

y{k)  =  H  x{k)  -b  A  u{k)  -b  B  w{k)  -b  v{k) 


Where: 

y(k)  =  Is  the  output, 
x(k)  =  Is  the  state  of  the  system 
T  =  Is  the  system  transition  matrix, 
u(k)  =  Is  an  Input  vector,  and 

w(k)  and  v(k)  are  Independent  white  noise  processes  with  covariance  matrices  Q  and  R, 
and  G,  H,  A,  &  B  are  dynamic  matrices.  The  salient  difference  between  the  .ARNIA 
modeling  and  CVA  Is  that  both  plant,  w(k),  and  measurement  ,v(k),  noise  sources  are 
estimated. 

Figure  5  depicts  the  CVA  modeling  and  prediction  process.  The  model  obtained 
through  CVA  techniques  Is  Inherently  adaptive  and  robust  In  that  as  changes  occur  In 
the  driving  forces,  a  new  model,  very  possibly  of  different  order  can  be  determined. 
Under  the  assumption  the  modeling  procedure  can  be  performed  quickly  and  frequently, 
a  very  accurate  representation  of  the  observable  dynamics  is  consistently  available.  As 
previously  shown  in  section  1.  an  optimal  (minimum  mean  square  error)  prediction  of 
the  state  is  immediate  from  the  Kalman  Filter. 
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Figure  5.  Dynamic  Modeling  and  Prediction  Using  CVA 

Figure  6  depicts  a  comparison  of  CVA  prediction  versus  that  of  the  optimum 
.ARMA.  model.  Also  shoM’n  is  the  actual  muzzle  motion.  The  superior  prediction  capabil¬ 
ity  of  the  CVA  is  evident. 

Although  the  ARMA  and  CVA  approaches  show  real  promise,  further  work  is 
required  to  refine  the  techniques  and  to  investigate  alternate  forms  of  the  adaptive  filter. 
In  particular,  there  is  potential  for  great  improvement  if  the  ARMA  or  CVA  modeling  is 
augmented  with  physical  modeling  of  the  gun  tube/turret  and  if  additional  data  such  as 
accelerometer/gyro  or  gunner  servo  error  is 
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A  NON-RECTANGULAR  SAMPLING  PLAN  FOR 
ESTIMATING  STEADY-STATE  MEANS 
Peter  W.  Glynn 

Department  of  Operations  Research 
Stanford  University 
Stanford,  CA  94305 


Abstract 

The  method  of  multiple  replicates  is  frequently  used  by  simulators  to  estimate  the 
steady-state  mean  of  a  stochastic  simulation.  One  important  advantage  of  this  approach 
is  that  it  is  easily  adapted  to  a  parallel  computer.  Unfortunately,  the  method  of  multiple 
replicates  is  quite  sensitive  to  contamination  by  “initial  bias.”  In  this  paper,  a  new  type  of 
sampling  plan  is  described.  It  retains  the  replication  flavor,  yet  attenuates  the  bias  prob¬ 
lem.  It  is  shown  that  the  new  method  reduces  mean  square  error  relative  to  conventional 
multiple  replicates  for  problems  in  which  the  “initial  transient”  decays  slowly. 

Keywords;  Simulation,  replication,  mean  square  error,  parallel  computation. 
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Introduction 


Let  V  =  (K(n) :  n  >  0)  be  a  real-valued  stochastic  sequence  corresponding  to  the  output 
of  a  stochastic  simulation.  We  assume  that  K  is  ergodic,  in  the  sense  that  there  exists  a 
finite  (deterministic)  constant  r  such  that 


n-“l 


1  * 


>s0 


as  n  -*  oo.  The  steady-state  simulation  problem  concerns  the  question  of  estimating  the 
parameter  r  efficiently,  and  providing  confidence  intervals  for  r. 

Basically,  two  alternative  approaches  for  dealing  with  this  problem  have  been  studied 
in  the  literature.  One  approach  is  known  as  the  method  of  multiple  replicates.  The  idea 
here  is  to  generate  m  independent  replicates  of  the  process  Y.  Each  replicate  is  simulated 
for  t  time  units.  The  advantage  of  this  method  is  that  it  gives  rise  to  independent  ob¬ 
servations;  this  significantly  simplifies  the  problem  of  producing  confidence  intervals  for 
r.  Furthermore,  given  access  to  a  parallel  computing  environment,  one  can  assign  each 
independent  replicate  to  a  different  processor.  Thus,  the  method  of  multiple  replicates  is 
well  suited  to  parallel  computation. 

A  disadvantage  of  this  approach  is  that  each  of  the  m  independent  replicates  is  con¬ 
taminated  by  initial  bias.  This  initial  bias  arises  from  the  fact  that  each  of  the  m  replicates 
is  initiated  with  an  initial  condition  that  is  atypical  of  the  steady-state  of  the  system.  If 
we  view  the  first  «  time  units  of  each  replicate  as  representing  an  “initial  transient”  for  the 
system,  this  analysis  suggests  that  nu  time  units  of  the  total  time  simulated  are  contami¬ 
nated  by  initial  bias.  If  m  is  large,  we  find  that  the  method  of  multiple  replicates  devotes 
a  significant  amount  of  computation  to  generation  of  highly  biased  observations.  This  is, 
of  course,  undesirable. 

In  response  to  this,  we  can  consider  sampling  plans  in  which  only  one  observation  of 
y  is  generated.  Such  a  strategy  is  known  in  the  literature  as  a  single  replication  method. 
Here,  only  the  first  3  time  units  of  the  simulation  are  significantly  biased,  and  there  is  no 
magnification  effect  by  the  parameter  m.  On  the  other  hand,  construction  of  confidence 
intervals  for  r  is  now  complicated  by  the  f£u:t  that  all  the  observations  collected  are  au- 
tocorrelated.  Furthermore,  it  is  now  a  non-trivial  task  to  make  an  assignment  of  parallel 
processors  that  will  significantly  speed  up  the  simulation. 

Note  that  the  method  of  multiple  replicates  involves  factoring  a  computer  time  budget 
T  into  m  replicates,  each  of  length  t  =  T/m.  If  we  view  the  data  of  the  t  ’th  replicate  as  being 
assigned  to  the  t’th  row  of  a  matrix,  we  obtain  a  rectangular  mxt  matrix  which  summarizes 
the  data  generated  by  the  simulation.  Consequently,  we  refer  to  the  method  of  multiple 
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replicates  as  a  rectangular  sampling  plan  for  estimating  steady-state  means  (see  Figure  1). 
Of  course,  a  single  replicate  method  is  the  special  case  of  a  rectangular  scheme  in  which 
the  data  corresponds  to  a  l  x  T  row  vector. 

In  this  paper,  we  consider  these  rectangular  methods  in  greater  detail.  We  also  pro¬ 
pose  and  analyze  a  new  non-rectangular  sampling  scheme,  which  attempts  to  offer  an 
advantageous  compromise  between  the  methods  of  single  and  multiple  replicates. 

The  organization  of  this  paper  is  as  follows.  Section  2  provides  reasonably  complete 
mean  square  error  analysis  of  conventional  rectangular  sampling  plans.  In  Section  3,  the 
non-rectangular  plan  is  introduced  and  studied.  Section  4  offers  some  conclusions. 


2.  Rectangular  Sampling  Plans 

We  start  by  describing  the  traditional  method  of  replication  for  solving  the  steady- 
state  simulation  problem.  To  simplify  the  discussion  that  follows,  we  will  assume  that  in 
X  imits  of  computer  time,  precisely  x  time  units  of  the  process  Y  can  be  simulated.  Thus, 
given  a  total  computer  time  budget  of  size  7,  we  can  implement  a  rectangular  sampling 
plan  in  the  following  way: 

1. )  Choose  the  number  m  of  independent  replicates.  (If  m  =  l,  this  is  a  single  replication 

method.) 

2. )  Choose  the  (deletion)  parameter  5,  from  the  interval  [0, 7/m].  (The  first  »  time  units  of 

each  replication  will  be  deleted  from  the  set  of  observations.) 

3. )  Generate  m  independent  copies  Yi,Y-i . Ym  of  the  process  Y.  Each  copy  is  simulated 

over  the  interval  [0, 7/m|. 

4. )  Set  t  =  [7/mJ  and  compute  the  estimator 


»(t-  a)  f-' 

'  ' ,=i7=,+i 

We  will  now  consider  the  mean  square  error  (MSE)  of  the  estimator  F(m,  a,  7).  The 
MSE  criterion  is  often  viewed  as  the  most  important  quantitative  measure  of  the  quality 
of  an  estimator.  We  start  with  the  well  known  MSE  decomposition  formula 


(2.1)  MSE(F(m,  a,7))  =  varF’(m,a,7)  +  (bias  F(m,a,7))^. 
By  using  the  independence  of  the  replicates,  we  observe  that 

(2.2)  varF(m,a,7)  =  ^var-^  Y[j), 

f7*  •  9 

i=»+i 


(2.3) 


biasf(m,a,7)  =  ^  Ey{3)-r. 


>=.+1 
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A  Rectangular  Sampling  Plan 


Replication  1  c 
Replication  2  c: 


Replication  m 


deleted 

observations 


Figure  1 


The  Non-rectangular  Sampling  Plan 


Replication  0 
Replication  1 
Replication  2 

Replication  m 


Figure  2 


In  order  to  analyze  the  terms  appearing  on  the  right-hand  sides  of  (2.2)  and  (2.3),  we 
will  assume  that  K(n)  can  be  expressed  as  a  real- valued  fimctional  of  a  time-homogeneous 
Markov  chain  X(n),  so  that  y(n)  =  /(X(n))  for  some  real-valued  /  defined  on  the  state  space 
5  of  X.  The  set  S  may  be  discrete  or  continuous.  Continuous  state  space  is  particularly 
convenient  in  analysis  of  discrete-event  simulations.  The  generalized  semi-Markov  process 
(GSMP)  view  of  discrete-event  systems  shows  that  very  general  discrete-event  simulations 
may  be  expressed  in  the  form  y(n)  =  /(-^^(n))  with  X  Markov,  provided  that  we  permit 
continuous  state  space. 

For  xeS,  u  >  1,  let  v(x,  u)  be  the  conditional  variance  defined  by 


u-l 


«(!,«)  =  i;  i^y(j)  |x(o)  =  * 


J=0 


u-l 


>=0 


Similarly,  let  b(x,  u)  be  the  conditional  bias  given  by 


6(i.u)=f:|iX^V(;)|Jr(0)  =  *|-r. 


Let  ^(  l  s=  P{A’(0)ff  }  be  the  initial  distribution  of  X.  The  Markov  property  permits  us  to 
re-express  (2.3)  as 


(2.4) 


bias  ?(m,  s,  T)  =  £^b(X(s  +  1),  f  -  5), 


where  £r^( )  denotes  the  expectation  operator  conditional  on  X(0)  having  distribution  /i. 

To  obtain  a  similar  expression  for  the  vziriance  term  (2.2)  requires  more  csu'e.  We  first 
apply  the  well  known  variance  decomposition  formula 


(2.5) 


Clearly,  we  have 


var-i-  2  y(y)  =  f;var|^  ^  y(j)|jir(5  +  i) 

%=.+!  J 

+  var£:|^  ^  y(j)|A-(5  +  l)|. 

varjj^  E  K(y)W*  +  l)|=vW*  +  l).t-^). 

^  I  E  + 1) I  =  +  1). « -  ^)- 


Plugging  these  expressions  into  (2.5)  yields 

(2.6)  var- -  Y(if)  =  £^v(X(s  +  l),t  -  a)  +  var^b(X(s -f- l),t  -  s), 


y=»+i 
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where  var„(  )  denotes  the  variance  operator  conditional  on  X(0)  having  distribution 

Suppose  that  X  is  a  positive  recurrent  Markov  chain  possessing  a  unique  invariant 
probability  distribution  v.  A  large  class  of  such  chains  has  the  property  that  under  suitable 
regularity  conditions, 

Bnpl£^h(X(s))  -  £.h(X(0))l  =  0(e-“') 

htM 

for  some  a  >  0,  where  is  some  appropriately  defined  family  of  real-valued  functions 
h:  S  R.  (See  NUMMELIN  (1984),  p.  120,  for  an  example  of  such  a  theorem.)  Assuming 
that  the  functions  «(-,u),  6(-,u),  for  all  u  >  1,  we  obtain  the  relations 

(2.7)  £^v{X{3  +  l),t  -  a)  =  £,V[X(0),t-  s)  +  0(e-<“), 

(2.8)  £^b{X(s  +  l).t  -  a)  =  £,b(X(0),t-  a)  +  0(«-«). 

(2.9)  £^b^{X(3  +  l),t  -  a)  *  £?,6’(X{0),t  -  a)  +  0(e~“*), 

where  the  constants  implicit  in  each  of  the  “big  Oh”  terms  are  independent  of  t. 

Furthermore,  for  such  a  recurrent  Markov  chain,  it  is  typically  the  case  that  the  steady- 
state  mean  r  can  be  expressed  in  the  form  r  =  £,f{X{0)).  As  a  consequence  of  the  station- 
arity  of  X  under  initial  distribution  v,  it  is  evident  that  £nY{n)  =  r  for  n  >  0  and  hence 
£;,6(X(0),t  -  a)  =  0.  Thus,  (2.8)  can  be  simplified  to 

(2.10)  £?^6(Jir(i  +  l),t-a)  =  0(e-“*). 

Combining  (2.9)  and  (2.10),  we  obtain 

(2.11)  var,.6(Jir(a+  l),«-a)  =  .&,i’(A'(0),f-a)  +  O(e-“*). 

(Again,  the  constants  implicit  in  (2.10)  and  (2.11)  are  independent  of  f.) 

Combining  (2.6),  (2.7),  and  (2.11),  we  obtain  the  expression 

1  ‘ 

var-i-  2  YU)  =  EMX{0),t-a)  +  £,b^X{0),t-3)  +  O{e-<^‘). 

*  ■■  *  y=.+i 

Repeating  the  vziriance  decomposition  (2.6)  under  var,( ),  we  find  that 

var,-l-  T  Y{j}^EMXiO),t-^)  +  E,b^X{0),t-3) 

I  *  s 

y=*+i 

and  hence 

(2.12)  var^  ^  ^  y(y)  +  0(e— ). 

JBt+l  jr=»+l 
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To  simplify  (2.12),  we  again  use  the  fact  that  X  is  stationary  under  initial  distribution 
ir.  Set  Ye[n)  =  y(n)  —  r, 

=  E,Y,(Qf  +  2  f;  E,Yc{0)Y,{k) 

fc=i 

r,  =  2f^kE,YMYc[k). 

k=l 

Under  appropriate  summability  hypotheses  (see,  for  example,  p.  172  of  BILLINGSLEY 
(1968)),  we  can  use  the  stationarity  to  write 

(2.13)  I'M  =  -  (7^  -  rf:  S  (i  - 

>=•+1  '  ’  fc=*=»  ^  ' 

Note  that 

(2.14)  \E,Y,{Q)Y,(k]\  <  ^  |/(x)l  •  \E,Y,{k)\ ■  ir[dx), 

where  Et{  )  is  the  expectation  operator  conditional  on  X{0}  =  x.  We  now  observe  that 
ExY(k)  =  Eaf{X(k))  -  Enf(X(Q)).  Appropriate  regularity  hypotheses  on  X  permit  us  to  assert 
that 


(2.15)  sup  |JE;./(Jf(lfe))  -  .B*/(A-(0))|  =  0(e*^‘) 

atS 

for  some  /J  >  0.  (See  p.  122  of  NUMMELIN  (1984)  for  a  typical  such  result.)  Substituting 
this  relation  in  (2.14)  yields 

E^Yc(0)Yc{k)  =  0(e-'»'). 

We  may  therefore  conclude  that 

(2.16) 

kseu  '  ' 

for  0  <  ,5'  <  ^.  Substitution  of  (2.16)  into  (2.13)  shows  that 


(2.17)  ±  r(y)  =  +  0(r'’'-'). 

y=«+i  ' 

Combining  (2.1),  (2.2),  (2.4),  (2.10),  (2.12),  and  (2.17),  we  obtain  the  important  rela¬ 
tionship 

(2.18)  MSE(r(m,.,r))  -  i  (^  -  J^)  *  0(.-)  +  io).-’*-'), 


where  the  implicit  constants  appearing  in  the  “big  OH"  terms  are  independent  of  m,  a,  and 

T. 
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To  gain  further  insight  into  (2.18),  we  consider  the  typical  situation,  in  which  the 
deletion  point  5  is  small  relative  to  the  length  t  of  each  replicate.  Furthermore,  in  order  to 
simplify  the  discussion,  we  assume  that  mt  =  T  (exactly) .  Then, 


(2.19) 

1  <7*  <7^  ff^ma  1 

mt-a  ~  V  )’ 

and 

(2.20) 

1  *7  mij  m  /max 

m(t-a)3  T2  V  T  )  ’ 

Combining  (2.18)  through  (2.20),  we  obtain  the  approximation 
(2-21)  MSE(P(m, T))  «  ^ 


Viewing  a  and  m  as  design  parameters  for  the  simulation,  we  see  that  (2.21)  suggests 
that  the  deletion  parameter  a  should  be  small.  On  the  other  hand,  if  a  is  chosen  too  small, 
difficulties  can  arise  in  the  “big  Oh”  terms  appearing  in  (2.18).  This  recommendation 
corresponds  to  intuition. 

As  for  the  number  of  replications  m,  m  should  be  chosen  small  (for  example,  a  single 
replicate  method  should  be  considered)  whenever  a^a  >  r).  For  reasonable  values  of  a,  this 
inequality  will  typically  be  valid.  Thus,  mean  square  error  favors  using  a  small  number  of 
replicates.  This  differs  from  the  conclusion  reached  by  KELTON  (1986)  in  his  analysis  of 
“replication  splitting”  schemes  for  simulation  of  autoregressive  sequences.  The  arguments 
there  show  that  using  a  large  number  of  replicates  can  reduce  the  variance  of  the  steady- 
state  estimator  when  the  autoregressive  sequence  is  positively  correlated  (i.e.  »?  >  o). 
In  our  current  setting,  we  judge  our  estimators  via  mean  square  error  (as  opposed  to 
variance).  Since  our  error  criterion  explicitly  considers  the  loss  in  estimator  efficiency  due 
to  bicis  (variance  does  not  measme  bias),  it  is  not  surprising  that  our  conclusions  differ. 
Of  course,  if  5  is  small  (i.e.  bias  is  not  a  major  problem),  (2.21)  supports  using  a  large 
number  of  replicates  when  rj  >  0, 

To  illustrate  the  above  points,  we  calculate  the  mean  square  error  of  F(m,a,  T)  when 
m  =  rpfO  <  p  <  1)  cind  a  =  T^(0<  q<  1  -  p),  in  which  case  t  =  T’’,  where  r  =  1  -  p.  We  find  that 

(2.22)  MSE(F(m, ,,  T))  =  ^  + 

Assuming  that  p  +  g  <  1/2,  (so  that  the  “big  oh”  term  is  small)  we  find  that  relation  (2.17) 
confirms  the  previous  discussion.  Both  p  and  q  should  be  chosen  small,  in  accordance  with 
our  previous  recommendations. 

3.  A  Non- Rectangular  Sampling  Plan 

The  idea  behind  the  sampling  plan  to  be  described  in  this  section  is  that  we  try 
to  avoid  expending  a  significant  fraction  of  the  computer  time  budget  on  generation  of 
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highly  biased  observations.  As  discussed  in  the  Introduction,  the  initial  bias  problem  is 
of  particular  concern  when  the  method  of  multiple  replicates  is  used,  since  the  amount  of 
contaminated  data  is  proportional  to  the  number  of  replicates.  On  the  other  hand,  the 
method  of  multiple  replicates  enjoys  several  significant  advamtages:  ease  of  construction 
of  confidence  intervals  and  development  of  parallel  simulation  schemes.  Our  goal  here  is 
to  develop  a  method  that  has  a  multiple  replicate  flavor  and  yet  avoids  the  initial  bias 
difficulties  that  are  associated  with  conventionaJ  multiple  replicate  methods. 

As  in  Section  2,  we  assume  that  the  output  sequence  Y  takes  the  form  y(n)  =  f(X(n)) 
for  some  time-homogeneous  Markov  chain  X,  and  real-valued  function  /.  The  following 
algorithm  employs  one  simulation  of  length  s  to  generate  an  initial  condition  which  is 
reasonably  typical  of  the  steady-state.  This  initial  condition  is  then  used  to  generate  m 
conditionally  independent  replicates  (each  of  length  t)  from  the  output  sequence  Y.  Thus, 
the  effort  to  generate  a  "good”  initial  condition  is  amortized  over  the  m  replicates.  In 
terms  of  observations  generated,  this  sampling  plan  is  non-rectangular  (see  Figure  2). 

The  non-rectangular  sampling  plan  can  be  summarized  as  follows. 

1. )  Given  the  computer  time  budget  T,  choose  the  number  m  of  (conditionally  independent) 

replicates,  and  the  deletion  parameter  a  (0  <  a  <  T). 

2. )  Generate  one  copy  Yo  of  the  sequence  Y  to  time  a. 

3. )  Using  the  initial  condition  Xo(«)  (Xb  is  the  Markov  chain  corresponding  to  Yo),  generate 

m  copies  Yi,...,Y„  of  Y  to  time  t-1,  where  t  =  [(T - a)/m]. 

4. )  Compute  the  estimator 

Y(m,a,T)  =  ^f^*'^Y(j) 

i=l  >=0 

We  now  turn  to  computing  the  mean  square  error  of  Y[m,a,T).  As  in  Section  2, 

(3.1)  MSE(y’(m,a,T))  =  varrim,  a,  T)  +  (bias  Y[m,a,T))^. 

Using  the  fact  that  Yi(  )=Y{  +  a)  (=  denotes  equality  in  distribution),  we  find  that 

bias  ^(m.a.r)  =  E^6(X(a),f). 

From  (2.8),  it  therefore  follows  that 

(3.2)  biasy'(m,a,r)  =  0(e-"*). 

To  hamdle  the  variance  term  appearing  on  the  right-hand  side  of  (3.1),  we  again  use  the 
variance  decomposition  method: 

(3.3)  vaiY(m,a,T)  =  var.B{y(m,a,r)|Jfo(«)}  +  ^var{y(m,  a,  r)|Xo(a)}. 
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It  is  easily  seen  (use  the  fact  that  are  independent  and  identically  distributed, 

conditional  on  Xo(5))  that 

(3.4)  i:{y(m,,,r)lJfo(«)}  =  6(Xo(*).t)  a.s, 

(3.5)  var{y(m.5,r)lXo(5)}= -v(Xo(*),t)  a.s. 

YTi 

Combining  (3.3)  through  (3.5),  we  get 

(3.6)  vary(m,  s,r)  =  — £’,»«(X(s),t)  +  var,46(Jf(s),  t). 

m 

As  in  Section  2,  we  obtain 

(3.7)  v^yim,3,T)  =  -B^v(X(0ht)-hB,i^(X(0),t)+O(e-‘^') 

fn 

(use  (2.7),  (2.8),  and  (2.9)).  Recall  that 

1 

v»r,-  53  y(j)  =  E,v(X(0),  0  +  0- 

1=0 

(see  Section  2).  Plugging  into  (3.7),  we  get 

(3.8)  varP(m,s,r)  =  lvar„if;y{j)  +  Enb^X{0),t). 

tn  t  V  ^  / 

1=0  ' 

The  first  term  on  the  right-hand  side  of  (3.8)  was  analyzed  in  (2.17).  For  the  second  term, 
note  that 

OO 

6(x,t)  =  i6(x)-i53(£:.y(fc)-r), 

*  fcat 

where 

i(x)  =  f;(£;.y(fc)-r). 

fcsO 

From  (2.15),  it  is  evident  that 

(3.9)  sup  16(x,  t)  -  i6(x)|  =  ©(e""*). 

x»S  ' 

Consequently,  we  obtain  the  inequality 

(3.10)  6(jr(0),0<i6(X(0))  +  O(s-^‘). 

Since  E^yik)  =  r,  the  expectations  E^b{X[0),t)  and  E^b{X{0))  both  vanish.  From  (3.10), 
we  therefore  get 

£:,62(A'(0),£)  <  i£;,62(X(0))  +  O(e-''‘)^*IM^(0))l  +  O(e-"^‘)- 
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A  similaxly  derived  lower  bound  yields  the  formula 

(3.11)  t)  = 

Let  b  =  JS7,5^(Jr(0)).  To  simplify  the  following  discussion,  assume  t  =  (T  -  aj/m  (exactly). 
Combining  (2,13),  (3.8),  and  (3.11),  we  obtain  the  important  relationship 

(3.12)  MSE(y(m. a.  T))  =  1  I  +  0(a-^*)  +  0(a-»), 

where  'i  =  mm(^,^')  and  the  (implicit)  constants  in  the  “big  oh”  terms  are  independent  of 
m,  a,  and  T.  Expressing  t  in  terms  of  m,a,  and  T,  we  get 


(3.13) 


£l  —  —n  ( — ^ 


(3.14) 


_2_ 


mri  m 
2^  ^ 


O 


and 


(3.15) 


(m  — 1^  b  (m  —  l)m6  m(m— l)^/a^ 
m  /  ^  [fj  > 


assuming  that  a  is  small  relative  to  T.  Combining  (3.12)  through  (3.15),  we  obtain  the 
approximation 


(3.16)  +  + 

We  now  compare  the  mean  square  error  of  our  non-rectangular  sampling  plan  with  that 
of  a  rectangular  plan  having  the  same  computer  time  budget  T,  number  of  replications 
m,  and  deletion  parameter  a.  Comparing  (3.16)  to  (2.21),  we  see  that  MSE(y(m,  a,  T))  < 
MSE(7(m,a,T))  when 

tr^ms  >  a^s  +  i(m*  —  m). 

We  shall  shortly  show  that  6  >  a’.  Thus,  y(m,  a,  T)  beats  y(m,a,  T)  when  am  >  a  +  m^.  This 
will  typically  occur  when  a  is  large  relative  to  m.  Thus,  we  can  expect  Y(m,s,T)  to  have 
smaller  MSE  than  7(m,a,T)  whenever  a  must  be  chosen  relatively  large,  in  order  to  remove 
initial  bias. 

We  can  illustrate  this  point  when  m  =  T’’  (0  <  p  <  1)  and  a  =  (0  <  ?  <  i).  Then,  if 

p  +  q<  1/2, 

(3.17)  MSE(?(m,  ^  +  + 

Comparing  (3.17)  to  (2.22),  we  find  that  MSE{y(m,a,r))  <  MSE(F(m,  a,  T))  when  p  <  ?, 
as  was  suggested  above. 
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We  conclude  this  section  by  showing  b  >  <7^.  We  first  observe  that  6(1)  solves  Poisson’s 

equation 

b(x)  -  E,h(X{l))  = 

where  fc(x)  =  f(x)  -  r.  Additionally,  Erb(XlO))  =  0.  Then, 

n— 1  n+1 

E  =  E  + 1)) 

fc=0  k=l 


where  D*  =  -  ^{6(Jf(fc))|X(jk-  1)}  are  martingale  differences.  Note  that  if  X(0)=ir,  we 

can  apply  the  martingale  central  limit  theorem  (see  p.  205  of  BILLINGSLEY  (1968))  to 
conclude  that 

n— 1 

(3.18)  n-^/^^UXlk))=i^mO,l) 

k=Q 


where  A*  =  E„Dl.  (The  function  6( )  is  bounded  under  (2.15).)  If  the  left-hand  side  of  (3.18) 
is  appropriately  uniformly  integrable,  then 

(3.19)  n"‘var,  ^  fc{X(k))  —  A* 

ksto 


as  n  —  cxj.  But 


E  /<=(-^(*))  =  var*^  Y(j). 

k=0  ^  jssO 


From  (2.17)  zmd  (3.19),  it  follows  that  A^  =  E„Dl  =  But  D,  is  orthogonal  to  6(X(0)), 
being  a  martingale  difference,  and  hence 


£:,6(X(1))*  =  E^Df  +  £:,(£;{6(A'(l))|A-(0)}’). 


Since  6  =  £:,6(Af(0))^,  it  is  evident  that  6  >  <t^. 


4.  Conclusions 

The  non-rectangular  sampling  plan  introduced  in  this  paper  has  a  lower  mean  square 
error  than  that  of  the  corresponding  rectangular  plan  that  involves  an  equivalent  amount  of 
computer  time,  when  the  “initial  transient”  decays  slowly.  This,  of  course,  is  precisely  the 
setting  in  which  the  method  of  multiple  replicates  exhibits  its  poorest  behavior  (relative  to 
a  single  replicate  method).  Thus,  the  non-rectangular  plan  described  here  is  most  beneficial 
in  precisely  those  problems  for  which  multiple  replicates  is  typically  most  ineffective. 

It  should  be  clear  that  the  replication  component  of  this  non-rectangular  plan  is  well- 
suited  to  parallel  computation.  However,  the  generation  of  the  initial  condition  Xo(a)  is 
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not  easily  adapted  to  the  parallel  setting.  This  aspect  of  the  sampling  plan  described  here 
deserves  further  attention. 

Finally,  it  should  be  mentioned  that  a  great  deal  of  empirical  work  remains  to  be  done 
in  understanding  the  advantages  and  limitations  of  this  non-rectangular  method,  when 
applied  to  “real  world”  problems. 
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Covariance  Analysis  for  Split  Plot  and  Split 
Block  Designs  and  Computer  Packages 


Walter  T.  Federer  and  Michael  P.  Meredith 
Mathematical  Sciences  Institute 
Cornell  University,  Ithaca,  N.Y.  14853 


ABSTRACT .  Covariance  analysis  for  data  from  experiments  designed  in  a  split 
plot  or  split  block  design  la  mostly  ignored  in  statistical  literature. 
When  it  is  considered,  it  is  often  done  incorrectly  and/or  Incompletely, 
This  is  especially  true  for  computer  packages.  A  discussion  of  what  should 
be  done,  what  is  or  can  be  done  with  computer  packages,  and  a  possible 
solution  to  the  problems  is  given.  The  proposed  solution  is  to  obtain 
computer  output  for  a  particular  package  such  as  SAS,  GENSTAT,  BMDP,  etc. 
and  to  annotate  the  output  explaining  which  computations  have  been 
performed,  which  have  not,  and  which  are  still  needed.  If  an  incorrect  or 
useless  procedure  has  been  given,  it  is  so  stated.  A  short  description  of 
annotated  computer  outputs  prepared  to  date  is  given.  Annotated  computer 
outputs  for  five  packages  for  principal  component  analyses,  and  for  three 
packages  for  covariance  in  a  split  plot  design  have  been  prepared.  Two 
technical  reports  and  an  annotated  computer  output  have  been  written  for 
cluster  analysis.  Copies  of  these  reports  are  available  from  the 
Mathematical  Sciences  Institute. 


979 


COVARIANCE  ANALYSIS  FOR  SPLIT  PLOT  AND 


SPLIT  BLOCK  DESIGNS  AND  COMPUTER  PACKAGES 


Walter  T.  Federer  and  Michael  P.  Meredith 
337  Warren  Hall,  Biometrics  Unit 
Cornell  University 
Ithaca,  NY  14853 

BU-974-M*  May,  1988 


1.  INTRODUCTION.  Split  plot  and  split  block  designs  appear  to  be 
rather  mystifying  to  many  individuals.  They  apparently  are  not  cognizant 
of  the  many  and  varied  forms  these  designs  may  take,  the  philosophical 
nature,  concepts,  and  usage  of  the  several  error  mean  squares  that  are 
required,  and  the  nature  and  use  of  covariance  analyses  for  these  designs. 
Since  the  computational  procedure  for  an  analysis  of  variance  (ANOVA)  for 
orthogonal  split  plot  and  split  block  designs  are  trivial,  many  individuals 
feel  that  the  concepts  are  also  simple.  Computational  procedures  for  an 
ANOVA  do  not  explain  concepts  contrary  to  some  opinions. 

Yates  (1937)  described  one  type  of  split  plot  design  as  an  example  of 
a  class  of  designs.  Unfortunately  this  one  type  of  split  plot  design  is 
described  as  THE  split  plot  design  in  almost  all  of  statistical  literature, 
especially  in  textbooks.  Federer  (1955,  1975,  1977)  described  some 
variations,  some  misconceptions,  and  possible  population  structures  for 
these  designs.  With  regard  to  the  last  point,  a  glaring  omission  in 
statistics  textbooks  is  the  failure  to  Include  any  discussion  of  population 
structure  for  even  the  simplest  of  experiment  designs.  This  necessarily 
raises  the  question  about  meaningful  Inferences  when  the  population  is 
undefined  and  undescribed. 


In  the  Technical  Report  Series  of  the  Biometrics  Unit,  Cornell 
University,  Ithaca,  N.Y.  14853. 
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When  analyses  of  covariance  (ANCOVA)  are  attempted,  the  confusion 
continues.  This  becomes  strikingly  evident  in  outputs  for  computer 
packages  purporting  to  give  such  analyses  for  any  but  the  simplest  of 
experiment  designs  (See,  e.g.,  Federer,  1955,  Federer  et  al.  1979,  1987a, 
1987b,  1987c,  and  Searle  et  al.  1982a,  1982b,  1982c).  The  concept  of  a 
separate  regression  for  each  error  mean  square  is  lacking  in  a  number  of 
computer  packages.  Hence,  if  a  package  does  supply  output  for  means 
adjusted  for  a  covarlate,  the  adjusted  means  given  are  often  Incorrect. 
The  fact  that  there  may  be  as  many  regression  coefficients  as  there  are 
error  mean  squares  appears  not  to  be  understood.  Since  many  regression 
coefficients  can  be  and  are  computed  in  an  ANCOVA,  it  is  important  to 
understand  which  ones  are  to  be  used  for  adjusting  means  for  covariates  and 
why. 

Herein  we  shall  discuss  only  ANCOVA  for  three  specific  designs,  i.e. 

(i)  the  standard  split  plot  design  where  the  whole  plot  treatments  are 
in  a  randomized  complete  block  design  and  split  plot  treatments  are 
randomized  within  each  whole  plot, 

(ii)  a  split-split  plot  design  which  is  the  one  in  (i)  except  that  the 
split  plot  is  further  split  to  have  whole  plot  treatments,  split  plot 
treatments,  and  split-split  plot  treatments,  and 

(ill)  a  split  block  design  or  two-way  whole  plot  design  where  each  set 
of  treatments  are  in  a  randodmized  complete  block  design  arrangement. 
In  addition,  a  list  of  available  annotated  computer  outputs  (ACOs)  is  given 
in  the  last  section. 

2.  Split  Plot  Experiment  Designs.  The  almost  universal  split  plot 
experiment  design  discussed  in  statistics  textbooks  is  the  one  wherein  the 
whole  plot  treatments  are  in  a  randomized  complete  block  design  and  the 
split  plots  are  completely  randomized  within  each  whole  plot.  Denote  this 
as  the  standard  design.  However,  Federer  (1955,  1975)  has  pointed  out 
that  there  is  a  vast  variety  of  split  plot  experiment  designs  which  are 
used  in  practice.  There  are  many  different  experiment  designs  for  whole 
plot  treatments  as  well  as  for  split  plot  treatments.  Also,  almost  all 
statistics  textbooks  confine  their  discussion  to  an  ANOVA  for  the  standard 
split  plot  design  with  no  discussion  of  an  ANCOVA  or  of  an  ANOVA  for 
nonorthogonal  situations.  Computer  packages  such  as  SAS,  GENSTAT,  BMDP, 


and  others  are  set  up  to  provide  computations  for  nonorthogonal  situations 
but  a  full  description  and  use  of  computer  output  computations  is  lacking, 
resulting  in  a  need  for  annotating  computer  output  (AGO).  S.  R.  Searle  and 
several  co-workers  have  been  very  active  in  this  area.  A  list  of  ACOs 
prepared  by  this  group  is  given  later  in  the  paper.  It  should  be  noted 
that  Searle  is  currently  updating  a  number  of  previously  prepared  ACOs. 

In  order  to  keep  this  paper  relatively  short,  only  the  standard  (or 
usual)  split  plot  experiment  design  will  be  considered  in  detail.  Many 
response  models  may  be  used  for  the  vast  variety  of  experiments  designed  as 
a  split  plot  but  we  shall  confine  ourselves  to  the  linear  model  in  Federer 


(1955).  Let  the  ijkfA  observation  with  an  associated  covarlate 

be  represented  as  follows: 


I*  ♦  "j  *  V  ‘ij  *  “k  *  "ik  *  ♦  'uk. 


where  p  is  an  overall  mean  effect,  is  the  effect  of  the  i.th  whole  plot 
treatment,  is  the  effect  of  the  kt^  subplot  treatment,  Is  ^he 

interaction  effect  for  the  ikfA  combination  of  whole  plot  treatment  i  and 
split  plot  treatment  k,  is  a  random  block  effect  distributed  with  mean 
zero  and  variance  is  a  random  whole  plot  error  effect  distributed 

with  mean  zero  and  variance  e.,,  is  a  random  split  plot  error  effect 

6  IjK 

distributed  with  mean  zero  and  variance  ,  Z  .  .  is  the  mean  of  the 
covarlate  for  the  ijtA  whole  plot,  Z...  is  the  over-all  mean  of  the 
covariate  (i.e.,  the  usual  dot  and  bar  notation),  1  ■  1,...,  a,  j  •  1,  ...» 
r,  k  ■  1,  ...»  3,  is  a  whole  plot  linear  regression  coeficient  of  the  Y 
whole  plot  residuals  on  the  Z  whole  plot  residuals,  and  $2  1^  ^  split  plot 
linear  regression  of  the  Y  split  plot  residuals  on  the  Z  split  plot 
residuals.  Note  that  using  estimates  of  Bj^  and  82*  i.e,  B^  and  B2> 
adjust  means  is  the  correct  thing  to  do.  The  purpose  of  using  covariates 
is  to  reduce  the  variation  in  observed  Y  variable  means  by  measuring  and 
using  an  associated  covariate.  The  reduction  must  then  occur  in  the  error 
or  residual  line  in  the  ANOVA.  We  have  encountered  individuals  who  did  not 
use  this  regression  to  adjust  treatment  means  but  used  another  regression, 
e.g.,  on  the  total  line  in  the  ANCOVA.  This  is  incorrect  and  possible  with 
present  computer  packages  by  eliminating  the  effect  of  the  covariate  first. 

In  some  situations,  the  formulation  of  the  response  model  as  in  (1)  is 
inappropriate.  Although  (1)  could  be  appropriate  for  one  variable  or  for 
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one  Investigation  it  may  not  be  for  another.  Also,  as  formulated  (1)  has 
two  error  effects,  the  and  When  the  whole  plot  treatments,  e.g., 

represent  a  random  sample  of  treatments  from  a  population,  then  the  are 
distributed  with  mean  zero  and  variance  An  appropriate  error  term 

for  the  fixed  split  plot  treatment  effects  would  be  the  whole  plot  by 
split  plot  treatment  Interaction  mean  square.  The  would  have 

E,  ,  ax,,  «  0  and  variance  o  Likewise  in  an  ANCOVA,  the  appropriate 
regression  for  split  plot  treatment  means  would  be  computed  from  the 
interaction  line  rather  than  the  error  (b)  line  (see  Table  1).  In  other 
situations,  the  split  plot  treatments  or  both  split  plot  and  whole  plot 
treatments  could  be  considered  as  a  random  sample  of  treatments  and  the 
effects  would  be  random  rather  than  fixed  effects.  Appropriate 
modifications  in  ANOVA  and  ANCOVA  would  be  required  for  both  situations. 

A  response  model  for  variable  Y  is  formulated  and  then  an  ANCOVA  as  in 
Table  1  is  appropriate  for  a  single  covariate  Z  related  to  the  variable  Y 
in  a  linear  manner.  Note  that  the  relation  between  Y  and  Z  could  be 
polynomial  or  nonlinear  in  nature.  The  number  of  covariates,  say  c,  may 
exceed  one.  This  situation  may  be  handled  as  a  straight-forward  extension 
but  we  shall  not  consider  these  additional  complexities.  For  response 
model  equation  (1),  the  ANCOVA  is  given  in  Table  1.  The  sums  of  products 
are  computed  in  the  usual  manner.  For  example,  T 


yz 


IZ  « 
i  j 

where  the  e 


ylj  ^zij 


yz  "  ^ 

where  S  . ,  is  the  residual  for  the  variable  Y  alone 
yij 


and  S  . ,  is  the  residual  for  the  variable  Z  alone,  and  B  "  T  T  T  £  ^  e,., 

zij  i  j  k 

are  the  computed  split  plot  residuals  for  variable  h 


hljk  — * . . . — —  -  y,  z. 

The  above  computations  would  still  hold  even  for  non-ortho gonal  experiment 

designs.  The  mean  squares  in  ANCOVA  are  obtained  by  dividing  by  the  appro¬ 
priate  degrees  of  freedom.  If,  in  addition  to  an  ANCOVA,  it  is  desired  to 
obtain  F-statistlcs,  the  ratios  W  (ar-r-a)  /  A'  (a-1).  S'  [a(r-l )( s-1 )-l ] 

yy  yy  yy 

/  Byy(8-1),  and  I^^I  a(  r-1 )  (  s-1  )-l  ]  /  By^Ca-lXs-l)  may  be  computed.  Given 
that  the  6^^  and  are  NIID,  the  probability  of  obtaining  a  larger 

F-statistic  may  be  obtained  from  prepared  tables  or  computer  programs.  Even 
if  normality  does  not  hold,  the  probabilities  will  be  approximately  correct 
for  most  situations. 
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Table  1.  ANCOVA  for  equation  (1)  for  a  split  plot  experiment  design  ^ 


Sums  of 


Source  of 
Variation 

Degrees  of  Products 

Freedom  (df)  YY  YZ  ZZ 

1  df 

Adjusted  Sums 
Of  Squares 

Total 

ars  T 

yy 

T 

yz 

T  1 
zz 

Correction 
for  Mean 

1  M 

yy 

M 

yz 

M 

zz 

Block 

(r-l)  R 

yy 

R 

yz 

R 

zz 

Whole  Plot  ■  W 

(a-l)  W 

yy 

W 

yz 

W 

zz 

Error  (a) 

(a-l)(r-l)  Ayy 

A 

yz 

A 

zz 

(ar-a-r) 

A  -A^  / 

yy  yz 

A  ■ 

zz 

A* 

yy 

Split  Plot  •  S 

(s-1)  S 

yy 

S 

yz 

S 

zz 

S  X  W 

(a-l)(s-l)  I 

yy 

I 

yz 

I 

zz 

(as-a-s) 

I  -I2  / 

yy  yz 

I 

zz 

I' 

yy 

Error  (b) 

a(r-l)(a-l)  B^^ 

B 

YZ 

B 

zz 

a(r-l)(s-l)-l 

B  -B*  / 

yy  yz 

B 

zz 

B’ 

yy 

Whole  Plot 

(a-l) 

W 

yy 

(w 

yz 

+  A  ) 
yz 

1*  A* 

-  d.  -IS.  .  u' 

(adj .  for  Bj^) 

W 

zz 

+  A 

zz 

K,  yy 

ZZ 

Split  Plot 

(8-1) 

s 

yy 

(s 

yz 

+  B  )2  B* 

_ yz  ^  yz 

(Adj.  for  §2) 

^zz 

+  B 

zz 

yy 

S  X  W 

(a-l)(s-l)  lyy  - 

‘V 

+  B  ) 
yz 

2  B2 

.  .  _ yz  , 

(Adj.  for  S2) 

I 

zz 

+  B 

zz 

B  yy 

zz 

^  The  various  mean  squares  may  be  obtained  by  dividing  by  the  appropriate 
degrees  of  freedom. 

The  various  Y  means  adjusted  for  the  covariate  Z  are: 

Y.  (adj.)  -  Y,  -  B,(Z.  -  Z  )  -  Y’ 

Y  (adj.)  -  Y  -  §  (Z  -  Z  )  - 

•  ••It  fc  ••&  •••  ••My 

and 


where  §,*A  /A  Bo" 
1  yz  zz,  2 

for  the  various  means. 


-  Z 
B 

yz 


l.k 


) 


/  B  and  the  usual  dot  notation  is  used 
zz. 


Estimated  variances  of  a  difference  between  two  adjusted  means  for 
i  #  1’  and  k  •  k'  are: 
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Variance  of  a  difference  between  two  adjusted  whole  plot  treatment  means 


V  (Y!  -  Y'  )  -  (3*  +  aep 

X*«  C  0 


Variance  of  a  difference  between  two  adjusted  split  plot  treatment  means 


'  2  , 
ar 


k  ~  .k' 


B 


zz 


] 


Variance  of  a  difference  between  two  adjusted  split  plot  treatment  means 
for  the  aame  whole  plot  treatment 


V  -  y;  „.)  - 


[  f 


Variance  of  a  difference  between  two  adjusted  whole  plot  treatment  means 
for  the  same  split  plot  treatment 


''  <n.k  -  'n'.k)  ■  f  *  n>  ♦  *  “n> 

zz 


+  € 


B 


zz 


(3^  +  s3p 

c  0 


Ayy  /  (ar-a-r),  3*  -  B^  /  [a(r-) ( s-1  )-l  ]  , 


and  5^  ■  (  (3^  +  s3*)  -  3*  ]  /  s 


I 


3*  is  associated  with  a(r-l)(B-l)-l  degrees  of  freedom,  (3*  +  s3*)  is  assoc- 

t  E  0 

iated  with  ar-r-a  degrees  of  freedom,  and  the  degrees  of  freedom  for  the  last 


variance  above  are  approximated  as  the  .degrees  of  freedom  f  associated  with 


t  (f>  - 


(s-l)(3*  +  sa\)  t  (ar-r-a)  +  3*t  [a(r-l)(s-l)-l ] 

C  0  tt  c  ct 

(s-1)  (3^  +  330  +  3* 

€  6  C 
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where  (f)  is  the  tabulated  value  of  the  t-statistic  at  the  a  percentage 
level  for  f  degrees  of  freedoa.  This  approximation  underestimates  the 
degrees  of  freedoa  for  this  variance  (see  Cochran  and  Cox,  1950,  and 
Grimes  and  Federer,  1984). 

Given  the  above  variances,  one  may  now  use  a  multiple  range  procedure 

to  compare  Individual  pairs  of  means.  Some  authors  (e.g.  Cochran  and  Cox, 

1950)  consider  that  there  Is  a  correlation  between  the  split  plot 

experimental  units.  Hence,  the  whole  plot  expected  error  mean  square  would 

be  given  as  and  the  spilt  plot  error  would  be  written  as 

s^(l-p)  ■  ff*  where  the  correlation  p  Is  equal  to  sff?  /  ff* .  Although 
t  o 

this  formulation  Is  useful  for  many  situations  it  Is  not  of  universal 
application;  e.g.  when  measurement  error  or  competition  exists  between 
split  plot  experimental  units  but  not  between  whole  plot  experimental 
units.  Statistical  modeling  for  any  Investigation  should  be  carefully 
considered. 

3.  Split-Split  Plot  Experiment  Designs.  For  this  class  of  designs, 
various  experiment  designs  may  be  used  for  whole  plot  treatments,  for  split 
plot  treatments,  and  for  split-split  plot  treatments.  However,  we  shall 
confine  our  remarks  to  a  single  member  of  this  class,  l.e.,  the  whole  plot 
treatments  are  arranged  in  a  randomized  complete  blocks  design,  the  split 
plot  treatments  are  randomly  allocated  to  the  split  plot  experimental  units 
within  each  whole  plot  unit,  and  the  split-split  plot  treatments  are 
randomly  assigned  to  the  split-split  plot  experimental  units  within  each 
split  plot  experimental  unit.  There  will  be  r  randomizations  for  the  a 
whole  plot  treatments,  ra  randomizations  for  the  s  split  plot  treatments, 
and  ras  randomizations  for  the  p  split-split  plot  treatments.  The 
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traataent  design  considered  here  Is  a  three  factor  factorial  with  asp 
combinations,  but  it  should  be  noted  that  other  creatment  designs  are 
possible.  The  factors  are  assumed  to  be  fixed  effects  to  simplify 
presentation. 

One  possible  response  model  for  the  above  experiment  and  treatment 
design  for  a  variable  7  with  a  covariate  Z  is: 


hijk 


p  +  Ph  +  +  <hi  +  ^  ^^hi. 


-  Z  )  +  a  >»•  OT  +  e, 

• • . •  J  ij  hij 


^  -  Zhi..>  +  ’'k  ^  ’'^k  ^  “^jk  ^  “’'^ijk  ^  ’'hijk 

^^hijk  "  ^hij.^  ’ 


(2) 


where  the  first  nine  effects  are  as  defined  for  equation  (1),  y^  is  the 
effect  of  the  V.th  split-split  plot  treatment,  i 


interaction  effect  for  combination  ik,  cty 


jk 


a  two-factor 
is  a  two-factor  interaction 


effect  for  combination  jk,  oyt  is  a  three-factor  interaction  effect  for 

ij 

combination  ijk,  is  a  random  error  effect  associated  with  split-split 

plot  experimental  unit  hijk  and  distributed  with  mean  zero  and  variance 
a^,  is  s  linear  regression  coefficient  of  the  split-split  plot  Y 
residuals  on  the  corresponding  Z  residuals,  h«l,  ...,r,  !■!,  ...,a,  j 


1. 


and  k  «  1 , 


An  ANCOVA  for  this  design  and  response 


model  is  given  in  Table  2. 

The  various  adjusted  means  are  computed  as: 


7  ,  (adj.) 

7  ,  (adj.) 

. .  J . 

7  ,  (adj.) 

•  •  •  1C 

111. 

li.k  ‘““J-* 


Y  -  B,{Z  -  Z  )  .  Y' 

•  Xe»  Z  vX**  ••••  •X**  y 


7  .  -  B,(Z  .  -  Z  )  -  7’  . 

•  ••••  vaji 


Y  k  ■  k'  Z  >  *  Y- 

«  «  «  Jw  -J  eevlv  ••••  •••Ky 


lij.  -  -  viii.-  ■  ^:ij. , 


’.i.k-  -  “3<^i.k  -  •’'a.k  . 
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Table  2.  ANCOVA  for  equation  (2)  for  a  apllt-apllt  plot  experiment  deaign  * 


Source  of 
Variation 


Total  rasp 

Correction 

for  Mean  1 

Block  (r-1) 

Whole  Plot  *  W  (a-1) 

Error  (a)  (a-1) 

Split  Plot  ■  S  (s-1) 

S  X  W  (a-1) 

Error  (b)  a(r-l 

Split-Split 

Plot  ■  P  (p-1) 

W  X  P 

S  X  P 

W  X  S  X  P 

Error  (c) 


Sums  of 

Products 

df 

vv 

yz 

zz 

rasp 

T 

yy 

T 

yz 

T 

zz 

1 

M 

yy 

M 

yz 

M 

zz 

(r-1) 

R 

yy 

R 

zz 

(a-1) 

w 

yy 

W 

yz 

W 

zz 

(a-1) (r-1) 

A 

yy 

A 

yz 

A 

zz 

(s-1) 

s 

yy 

S 

yz 

S 

zz 

(a-l)(s-l) 

I 

yy 

I 

yz 

I 

zz 

a(r-l)(s-l) 

B 

yy 

B 

yz 

B 

zz 

(p-1) 

p 

yy 

P 

yz 

p 

zz 

(a-l)(p-l) 

O’ 

‘‘yz 

Qzz 

(p-l)(3-l) 

U 

yy 

^yz 

'^zz 

(a-l)(p-l) 

•(s-1) 

V 

yy 

V 

xy 

V 

zz 

as(r-l)(p-l) 

c 

yy 

C 

yz 

C 

zz 

Adjusted  Sums 
Of  Squares 


( ar-r-a) 


A  -A*  /A  -  A' 
yy  yz  zz  y; 


a(r-l)(s-l)-l  B  -B*  /  B  -  B' 

yy  yz  zz 


aa(r-l)(p-l)-l  C  -C^  /  C  -  C 

yy  y2  z* 


(a-1) 

W  -  (W 

+ 

A  )^ 

/ 

(w 

+ 

A 

) 

+ 

A* 

/ 

A 

■ 

W’ 

yy  yz 

yz 

zz 

zz 

yz 

zz 

yy 

(s-1) 

s  -  (s 

+ 

B 

)2 

/ 

(s 

+ 

B 

) 

+ 

/ 

B 

m 

s' 

yy  yz 

yz 

zz 

zz 

yz 

zz 

yy 

(a-l)(s-l) 

I  -  (I 

+ 

B 

)* 

/ 

(I 

+ 

B 

) 

+ 

B* 

/ 

B 

m 

I' 

yy  yz 

yz 

zz 

zz 

yz 

zz 

yy 

(p-1) 

p  -  (p 

+ 

c 

/ 

(P 

+ 

C 

) 

+ 

C* 

/ 

C 

m 

p' 

yy  yz 

yz 

zz 

zz 

yz 

zz 

yy 

(a-1) (p-1 ) 

Q  -  (Q 

+ 

c 

)* 

/ 

(Q 

+ 

C 

) 

+ 

/ 

C 

m 

Q’ 

yy  yz 

yz 

zz 

zz 

yz 

zz 

yy 

(p-l)(s-l) 

u  -  (u 

+ 

c 

)* 

/ 

(U 

+ 

C 

) 

+ 

c* 

/ 

C 

m 

U' 

yy  yz 

yz 

zz 

zz 

yz 

zz 

yy 

)(a-l)(p-l)(3-l) 

V  -  (V 
yy  yz 

+ 

C  )2 
yz 

/ 

(V 

zz 

+ 

C 

z 

z> 

yz 

/ 

C 

zz 

m 

v 

yy 

^  The  various  mean  squares  may  be  obtained  by  dividing  by  the  appropriate 
degrees  of  freedom. 
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and 


^IJlc  <*«•>  -  ^.ijk  -  -  MllJ.-  ^l..> 

■  ”  *'ljk  , 

wh.r.  Ij  -  /  A^^,  Sj  .  and  83  - 

Estimated  variances  of  a  difference  between  two  means  adjusted  for  a 

covarlate  for  i  f  i',  j  ^  j',  E  -  A'  /  (ar-r-a),  E,  ■  B'  /  ( a(r-l ) ( s-l )-l  ] , 

a  yy  d  yy 

and  S*  ■  C  /  {aa(r-l)(p-l)-l ]  are  given  below: 

If  yy 

Variance  of  a  difference  between  two  whole  plot  treatment  adjusted  means 


V(Y'  -  Y', ,  ) 

•X«*  *1  •• 


r  A.  ^.i’..^'  I 

L  rap  A  J 


Variance  of  a  difference  between  two  split  plot  treatment  adjusted  means 


V(Y'  -  Y'  ,  )  -  E. 

..J.  ..j.  o 


r  ^  1 

L  arP  J 


Variance  of  a  difference  between  two  split-split  plot  treatment  adjusted  means 


^^.l.k  ■  ^.i.k'^* 


r  _2  +  ^  -lafc  -i- 

I  rs  C 

L  <99 
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Variance  of  a  difference  between  two  adjusted  meana  for  comblnatlona  ik  and  l*k 

-  (sS^  +  ,  *■  Z  , 

VCY'  -  Y'  )  «  - - = - = - ^  +  a  — - ■  ■  - ■  • 

zz 

^^.i.k  ~  ^.i..  ~  ^.i*.k  ^.iV..^^ 

C 

zz 

Variance  of  a  difference  between  two  adjusted  means  for  combinations  ilk  and  ijk' 


zz 


Variance  of  a  difference  between  two  adjusted  means  for  combinations  ijk  and  ii*k 


''<’':ijk-  ''ij'k*  ■  “ '’e  *  n> 

,  fL  <litk-^M.  -^i,'k-^ir.>^ 

c 

zz 

Variance  of  a  difference  between  two  adjuated  means  for  combinations  ijk  and  iMk 


K  (Z  -  Z  -  Z  .  +  Z  ,  )< 

p  "ij*  .ij. 


B 


zz 


!L  ■  ^-ij-  ■  . 

C 

zz 

Note  thet  V  f  -  v(y;^^^  -  y;^,.,,)  -  V  (y;^.,  -  y;^,^^,) 

•  ''(’ajk  -  ’'.i'l’k')  ''‘^lijk  -  ^lij'k*  ■  '''"lijk  -  lij'k'»  • 

Most  variances  above  without  the  covariate  were  given  by  Federer  (1955). 
Also,  the  expected  values  of  and  are  +  po*  +  psfT^  and  <T^  +  p<T^ 
respectively.  Estimates  of  variance  components  0^,0^  ,  and  E(a*)  ■  0* 

0  €  TT  IT 


are  needed  to  compute  the  fifth,  seventh,  ninth,  and  tenth  variances  above. 
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The  degrees  of  freedom  for  these  variances  need  to  be  approximated  as  they 
were  in  the  previous  section.  Also  note  that  ps(5^  +  5*  +  5*)  « 
s(p-l)S*  +  (s-l)E.  +  E  and  p(a^  +  5*)  ■  (p-l)ff^  +  E, 

W  Dfl  ITD* 


4.  Split  Block  Experiment  Design.  The  experiment  design  considered 
here  Is  denoted  as  a  split  block  design.  It  has  also  been  called  a  two-way 
whole  plot  and  a  strip  trial  design.  This  design  has  received  no  attention 
In  statistical  textbooks  with  an  exception- being  Federer  (1955).  It  does 
occur  frequently  in  practice  but  sometimes  Is  not  analyzed  correctly  as  a 
split  block  design.  The  member  of  this  class  of  designs  we  shall  discuss 
will  be  for  a  two-factor  factorial  treatment  design  with  the  levels  of  one 
factor  being  applied  perpendicularly  across  all  levels  of  the  second  factor 
within  each  replicate  or  complete  block.  The  levels  of  each  factor  will 
have  the  same  design  for  our  example,  that  is  a  randomized  complete  block 
design.  (The  levels  of  one  factor  could  be  in  a  randomized  complete  block 
design  and  the  levels  of  the  second  factor  could  be  in  a  latin  square, 
balanced  incomplete  block,  or  other  experiment  design.)  Note  that  there 
will  be  r  separate  randomizations  for  the  levels  of  each  of  the  factors. 
The  number  of  levels  of  factor  one  is  a  and  the  number  of  levels  of  the 
second  factor  is  b,  resulting  in  an  a  x  b  factorial  treatment  design. 

A  response  model  equation  as  given  in  Federer  (1955)  for  a  variable  Y 
and  a  covariate  Z  is: 


^hij  •  ^  *  \j  *  “"^ij  ^  "hij  ^  ®l^^hi.  ■ 

■  ^hi-  ~  ^h-j  ’ 


(3) 


where  p  is  a  general  mean  effect,  pj^  is  the  \\Cfi  block  effect,  which  has 

mean  zero  and  variance  <J^  ,  a.  is  the  effect  of  the  i.th  level  of  factor 

P  X 

one,  say  A,  y  ^  is  the  effect  of  the  j  t A  level  of  factor  two,  say  B,  is 

a  random  error  effect  for  the  hicA  whole  plot  for  factor  A  and  has  mean 
zero  and  variance  x,  .  is  a  random  error  effect  for  the  hj  tA  whole 

plot  for  factor  B  and  has  mean  zero  and  variance  OT .  .  is  the 

1  Ij 

interaction  effect  for  the  i  J  rA  combination  of  levels  of  factors  A  and  B, 

e,  ,  ,  is  a  random  error  effect  associated  with  the  hi  j  rA  subplot  for  the 
hij 
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A  X  B  Interaction  and  has  mean  zero  and  variance  o|,  is  the  linear 
regression  of  Y  whole  plot  residuals  on  the  Z  whole  plot  residuals  for 
factor  A,  B2  is  the  linear  regression  of  the  Y  whole  plot  residuals  on  the 
Z  whole  plot  residuals  for  factor  B,  and  B^  is  the  linear  regression  of  Y 
subplot  residuals  on  Z  subplot  residuals. 

An  ANCOVA  for  response  model  (3)  la  given  in  Table  3.  For  this  design 
and  for  fixed  effects  for  the  a  x  b  factorial,  there  are  three  error 
variances  and  three  error  regressions.  Given  that  the  error  effects  are 
NIID,  the  usual  F  statistics  may  be  used  if  desired.  The  adjusted  means 
are  given  by: 

Y  (adjusted)  *  Y  ,  -  (Z  .  -  Z  )  *  Y'  , 

•  1.  •*  .!•  1  .1.  ...'  • 

Y  .(adjusted)  -  Y  ,  -  B,(Z  ,  -  Z  )  -  Y’  , 

•  •j  ..j  2-*j  •••  “j 

and 

Y,.j<adl„sted)  .  V.jj  -  -  2_  _)  -  (2..j  -  Z...)  - 


where  the  Bs  are  defined  in  Table  3. 

Estimated  variances  of  a  difference  between  adjusted  means  are  given 
below  for  i  #  i',  j  #  j'; 


V(Y'  -  Y', ,  )  -  E 
•  i  •  •  1  •  a 


r  9  (Z  .  -  Z  )2  , 

IJi  + - li! - liJj - 1 

Lrb  A  J 

zz  -* 


y;.,,)  -  E, 


'  j  •  •  j 


[ 


b  L  ab 


7  (Z  d  -  Z  ,,)2 

1  4. _ ’-Ll _ .\:1  , 


zz 


]  • 


1 


v(y:,,-yv,,)  -  r  (3i+s 


E. 

^Sl)  +  (Z 


ij  ‘.ij-  '  r  ^  B_  .  j'^. .  j  -  ^  ^^-ij-^-ij '-2. .  j+Z. .  j  . )‘  . 


zz 


zz 


C._  ^^.ij  ■  ^-i’j  ■  ^-i-  ' 


zz 
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and 


-  Y’  ,,,)  -  7  O*  +  3*  +  ai)  + 


■ij 


•I'J 


r  « 
£ 


zz 


(Z  .  -  Z  ^  (Z  .  -  Z  ,,)i 

•i*  -i  •  .-j  ..j' 


r —  (Z..-Z.,.,-Z  .  +Z.,  -Z 
^zz  J  ■ 


.j+ 


where 

and 


*  A^y/(ar>a-r)  •  3*  +  bSJ,  /(br-b-r)  -  +  a32 

C'  /[(a-l)(b-l)(r-l)-l]  -  31 


c  yy  . . .  e 

The  decrees  of  freedom  for  the  laat  three  variances  need  to  be  approximated 
by  the  method  previously  given  or  by  some  other  appropriate  approximation 
(See  e.g.,  Grimes  and  Federer,  1984). 


Table  3.  ANCOVA  for  equation  (3)  for  a  split  block  experiment  design. 


Source  of  variation  df  Sum  of  products  I  df  Adjusted  sums  of  squares 


Total 

rab 

T 

T 

T 

yy 

yz 

zz 

Correction  for 

mean  1 

M 

M 

M 

yy 

yz 

zz 

Replicate  •  R 

(r-1) 

R 

R 

R 

yy 

yz 

zz 

Whole  plot  A 

(a-1) 

w 

W 

w 

yy 

yz 

zz 

Error  (a) 

(r-l)(a-l) 

A 

A 

A 

yy 

yz 

zz 

Whole  A  adjusted  for  0,  «  A  /A 

1  yz  zz 


A^ 

(ra-a-r)A  -  7^  ■  A' 


yy 


zz 


yy 


(w  +  A  )*  A^ 

(a-l)W  -  -  w 

yy  W__+  A__  A__  yy 


zz  zz 


zz 


Whole  plot  B 
Error  (b) 


(b-1)  S  S  S 

yy  yz  zz 

(b-l)(r-l)  B  B  B 

yy  yz  zz 


Whole  plot  B  adjusted  for  0-  ■  B  /B 

2  yz  zz 


B^ 

(rb-b-r)B  -  =»  B' 

yy  yy 

(S  +B  B2 

(b-i)s  -  —  +  P- 

yy  S  +  B  B 

zz  zz  zz 


yy 


A  X  B  (a-l)(b-l) 

Error  (ab)  (r-l)(a-l)(b-l) 


III 
yy  yz  zz 

C  C  C 
yy  yz  zz 


(r-l)(a-l)(b-l)-l  C  -  ■  C’ 

yy  C  yy 


Interaction  adjusted  for  0,  * 


C  /C 
yz  zz 


(a-l)(b-l)Iyy- 


(I  +C  )2 

-  yz  yz 


I  + 
zz 


zz 


-YS.  . 
C 

zz 


yy 
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5.  Some  Comments.  Since  fornulas  for  nany  of  the  above  adjusted 
means  and  variances  do  not  appear  in  statistical  literature,  it  was  deemed 
appropriate  to  include  them  here.  As  can  be  seen  from  the  analyses  for 
relatively  simple  designs  from  each  of  the  three  classes,  there  ar>3  a 
variety  of  formulas  for  adjusted  means  and  variances  of  differences  between 
two  adjusted  means.  The  more  complex  members  of  each  class  may  have  5,  10, 
15,  or  20  error  mean  squares  and  the  same  number  of  regression 
coefficients.  Experiments  are  conducted  wherein  some  of  the  factors  are 
arranged  in  split  blocks  and  others  in  split  plot  arrangements.  Many 
different  designs  may  be  used  for  the  different  factors  (See  e.g.,  Federer, 
1955,  1975).  The  moat  complex  experiment  design  encountered  is  described 
by  Federer  and  Farden  (1955),  where  there  are  several  split  plot  and 
several  split  block  arrangements  with  a  total  of  75  error  mean  squares  and 
203  lines  in  the  ANOVA. 


One  method  of  aiding  investigators  with  ANOVAs  and  ANCOVAs  of 
complexly  designed  experiments  is  to  ascertain  how  much  of  a  statistical 
analysis  can  be  obtained  with  computer  packages  such  as  SAS,  BMDP,  GENSTAT, 
SPSS,  and  others.  Then,  the  output  can  be  annotated,  i.e.  an  explanation 
is  appended  to  the  computer  output  describing  what  has  been  computed  and 
how  to  use  the  results.  Annotated  computer  outputs  for  two  different  split 
plot  designs  with  a  covariate  have  been  completed  for  SAS,  BMDP,  and 
GENSTAT  (see  Federer  et  al.  1987a,  1987b,  1987c).  In  addition  to  these 
covariance  analyses,  annotated  computer  outputs  have  been  prepared  for 
principal  component  analysis  from  five  computer  packages  and  the  mixture 
method  of  cluster  analysis  on  SAS.  A  listing  of  these  is  given  in 
Appendix  A.  A  second  list  of  material  available  from  the  Biometrics  Unit 
is  given  in  Appendix  B. 

The  analyses  have  been  described  for  a  single  covariate.  Noting  that 


one  may  simply  use  A^^d-R^)  • 


A  -  A*  /A  "A  (1-r*  )  »  A* 

yy  yz  zz  yy  yz  yy, 

A'  when  there  are  several  covariates  and  where  is  the  squared 

yy 

multiple  correlation  coefficient  computed  on  the  error  line.  If  the 
relationship  between  a  covarlate  Z  and  Y  is  curvilinear,  it  may  be  possible 
to  use  some  function  of  Z,  e.g.  log  Z,  VZ,  1/Z,  which  makes  the  relation 
linear.  If  this  can  be  accomplished  both  computations  and  interpretations 
are  simplified. 

A  simplification  of  the  estimated  variances  for  differences  of  means 
has  been  given  by  Yates  (1934)  and  Finney  (1946).  Instead  of  computing  the 
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quantities  (Z,  -Z,,  )*/  A  and  <Z  .  -  Z  41)*  /  B  ,  e.g,  for  each 

pair  of  means,  one  may  compute  a  single  variance  by  using 

S  /(8-1)B  ,  respectively.  The  quantity  W  /(a-1)  is  an  average  of  all 

Z  Z  XX 

pairs  il'  of  (Z  ,  -  Z  .  This  simpllcation  and  approximation  consid- 

*  1  •  «  1  • 

erably  reduces  the  number  of  computations  for  large  a  and/or  s.  For  the 

quantities  (Z..  -Z..,  -Z,  +Z,,  )*  and  (Z,,-Z,,,-Z  j+Z  .)* 

•ij  -ij  .i.  .1.  .ij  -ij  -'J* 

it  is  suggested  that  I  /  (a-l)(s-l)B  be  used  if  it  is  desired  to  compute 

XX  ZZ 

only  a  single  variance. 
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Appendix  A 


MSI  ANNOTATED  COMPUTER  OUTPUT 


ORDER  FORM 


1.  COVARIANCE  ANALYSIS  FOR  SPLIT  PLOT  DESIGN 

Office  Ref. 


SAS . ACO  #87-8.. _ copies  at  $  S  each^  $ 

BMDP  2V . ACO  #87-5.. _ copies  st  $  5  esch'^ 

GENSTAT . .ACO  #87—4.. _ copies  at  $  5  each^  $ 


2.  PRINCIPAL  COMPONENT  ANALYSIS 

SAS . 

SYSTAT . 

BMDP . 

SPSS-X . 

GENSTAT . 

3.  CLUSTER  ANALYSIS  (Mixture  Method) 

TEXT . . . TR  #86-38. _ copies  at  $  5  each"*"  $ 

SAS . TR  #87-5.. _ copies  at  $  5  each^  $_ 

(Comparing  2  Clustering  Methods 
to  the  Mixture  Model  Method) 

SAS . ACO  #87-1.. _ _copies  at  $  5  each*  $_ 

(Annotated  Computer  Output  for  SAS,  above) 

TOTAL . $_ 

One  copy  is  free  for  U.S.  Army  Personnel  upon  request. 


.ACO  #86-1.. _ copies  at  $  5  each  $_ 

.ACO  #87-6.. _ copies  at  $  5  each* 

.ACO  #87—7..  copies  at  $  5  each*  $ 

.ACO  #87—2.. _ copies  at  $  5  each*  $ 

■ACO  #87—3. .  copies  at  $  5  each* 


Send  Check  (payable  to  Cornell  University)  to: 
Mathematical  Consulting  Liaison  Group 
Mathematical  Sciences  Institute 
294  Caldwell  Hall 
Cornell  University 
Ithaca,  New  York,  14853,  U.S. A. 


The  above  order  Is  to  be  sent  to: 


(please  print) 


NOTE:  Orders  will  be  mailed  only  after  funds  are  received.  This  Is  our  only  Invoice. 
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Appendix  B  ANNOTATED  COMPUTER  OUTPUT  (AGO) 

Order  Form 


SMoad  ZditlMS  ACO(.  198S-9 


(i]  Anaiyala  of  Varlanca  Of flea  Raferance 

BMDP2V..» . (dua  Aug. *88) . . coplaa  at  $12  aach  $ 

CEHSTAT-ANOVA . (dua  Apr. *88) - 962 . . coplaa  ac  $12  aach  $ 

SAS  CLH . . . 949 .  coplaa  at  $12  aach  $ 

SAS  HARVEY.  (First  Edition) . 659 . coplaa  at  $  5  aach  $' 

SPSSX  ANOVA . 955 . coplaa  at  $12  aach  $^ 


(ll)  Varlanca  Co^Moant  EatlMtion 

ACOa*:  BMDP>V. . . (dua  Fab. *89) . coplaa  at  $12  aach  $ 

ACO«*:  SAS  HARVEY. (First  Edition) . 723 . . coplaa  at  $  5  aach  $ 

ACOa*:  SAS  RAMDOM . (dua  Juno *88)....  coplaa  at  $12  aach  $| 

ACOa*:  SAS  CLM  VARCOHP..(dua  Juna*88) . coplaa  at  $12  aach  $' 


First  Edition:  AGO  COV,  1982 


Analysis  of  Covariance 

Taxt. ........................... 

. 780.... 

BM0(P1V,  P2V,  P4V) . 

CENStAT  (ANOVA) . 

. 782.... 

a  ... _ copies  At  $  5  AACh  $ _ 

a  a..  At  $  S  AA^h  $ 

SAS  (GLM  and  HARVEY) . 

. ... _ .copies  at  $10  each  $ 

SPSS  (ANOVA.  KAROVA) . . 

. . . . _ copies  at  $10  each  $  _ 

Other  Publications  Avallabla 


1.  SOLUTIOKS  KAMUAL  to  Saarla's  LIHEAli  MODELS. . . copies  at  $  7  aach  $_ 

2.  SOLUTIONS  KAMUAL  Co  Saarla's  MATRIX  ALCEBXA  USEFUL 

FOR  STATISTICS. . copies  aC  $  7  aach  $_ 

3.  NOTES  ON  VARIANCE  COHPONENTS  by  S.R.  Saarla . copies  at  $  7  aach  $_ 

4.  proceedings:  STATISTICAL  DESIGM  TNEORT  i  PRACTICE, - - coplaa  at  $20  aach  $ 

COMFEREMCE  IM  BOMOR  OF  tf.T.  FEDERER,  1986 

5.  EXERCISES  FOR  SIHPLE  REGRESSION 

Progran  REGDAIA . . . . [copies  at  $  5  aach  $_ 

List  of  100  Data  Sots . . . T. .  coplaa  at  $  5  aach  $_ 

6 .  EIEUOCRAPBT  OR  EXPERIMERT  AMD  TREATMEMT  DESICM  - 

PRE-1968  hy  V.T.  Fadarar  and  L.N.  Balaam . . . copies  at  $10  aach  $_ 

TOTAL . $_ 

7.  EXPERIMEMTAL  DESICM\fj  V.T.  Fadarar . copies  at  $13  aach  $_ 

(check  payable  Co  V.T.  Fadarar) 


Sand  check  (payable  Co  Cornell  University)  to: 
BloaaCrics  Unit 
336  Varran  Hall 
Cornell  University 
Ithaca,  Mau  York,  14853,  U.S.A. 

The  above  order  la  to  be  sent  to: 


(please  print) 


NOTE;  Ordara  will  bs  nailed  only  after  funds  are  racalvad.  This  Is  our  only  Invoice. 
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ALTERNATIVES  TO  HYPOTHESIS  TESTING 
INCLUDING  A  MAXIMUM  LIKELIHOOD  ESTIMATE  TECHNIQUE 

NATHANAEL  ROMAN 

ARMY  MATERIEL  TEST  AND  EVALUATION  DIRECTORATE 
WHITE  SANDS  MISSILE  RANGE,  NEW  MEXICO 

1.  ABSTRACT 

Hypothesis  testing  is  often  used  to  make  decisions  with  respect  to  random 
data. 

In  Army  air  defense  system  specifications,  criteria  for  hypothesis  testing 
are  rarely  defined.  Therefore,  selection  of  pass/fail  and  risk  criteria  is 
arbitrary.  Criteria  are  or ten  chosen  that  tend  to  minimize  the  contractor's 
or  developer's  risk  and  to  maximize  the  system  user's  risk.  Hence,  selection 
of  hypothesis  test  criteria  may  compromise  specification  performance  standards 
and  the  system  user's  Interests  during  the  test  and  evaluation  process. 

Alternatives  to  hypothesis  testing  are  provided  that  use  the  specification 
performance  standard  itself  as  the  pass/fail  criterion  for  decision  making  and 
that  directly  indicate  the  risk  associated  with  any  resulting  conclusion.  The 
alternative  approach  includes  a  maximum  likelihood  estimate  technique  that 
compares  two  random  variables. 


2.  CONTRACTOR'S  VIEWPOINT 

The  contractor  asserts  or  assumes  that  the  system  was  built  such  that  the 
population  mean  equals  the  requirement.  Hence,  the  system  was  neither  over- 
designed  nor  underdesigned.  The  sample  distribution  of  sample  means  f{x) 
would  then  be  as  illustrated  in  Figure  1.  In  this  approach,  the  contractor 
accepts  a  risk  a  that  potentially  maximizes  the  system  user's  risk.  The 
contractor's  risk  is  the  probability  of  rejecting  a  system  that  meets  the 
requirement,  a  Type  I  error.  Typical  values  of  a  range  from  0.10  through 
0.01.  If  the  contractor's  assertion  is  accepted,  then  a  one-sided  hypothesis 
test  based  on  this  approach  would  accept  the  null  hypothesis  that  >  R  for 
any  y-  >  R^  and  would  reject  the  null  hypothesis  for  any 
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Sample  Distribution 
of  Sample  Means 


R  »  Requirement 

N  =  Sample  Size 

Up  »  Population  Mean 

uj  =  Mean  Value  of  Sample  Means  (x) 

Op  =  Population  Standard  Deviation 

o-  *  Standard  Deviation  of  Sample  Means 

=  »p  //TT 

a  =  Level  of  Significance  (one-sided) 

=  /“f(i)  dx  =  F(R^) 

CD 

R^  =  A  Sample  Mean  Determined  by  a 
f(x)  =  power  density  function  in  x 

Figure  1 


3.  SYSTEM  USER'S  VIEWPOINT 

However,  if  the  contractor's  assertion  is  incorrect  (i.e.,  Up  <  R),  and  if 
the  sample  mean  distribution  based  on  actual  data  represents  the  actual  popu¬ 
lation  (i.e.,  u-  =  I'p  ^  “  ®p/»^  N  ),  then  the  user's  risk  is: 

OS 

/  f  (x)  dx  for  R  <  u-  =  u„  <  R 
^  ax  p 

For  the  conditions  given,  the  user's  risk  is  the  probability  of  accepting  a 
system  that  does  not  meet  the  requirement,  a  Type  II  error.  The  smaller  the 
a,  the  smaller  the  R^,  and  the  greater  the  user's  risk.  The  user's  risk 
ranges  between  0.5  and  1.0  minus  a. 
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Figure  2 


4.  RECOMMENDED  ANALYTICAL  ALTERNATIVE  FOR  ASSESSING  REQUIREMENT 


System  performance  analysts  must  assess  whether  the  specification  require¬ 
ment  was  met  or  not  met.  If  the  analyst  assumes  that  the  data  are  representa¬ 
tive  of  the  population  (i.e.,  uj  ®  p.  and  ch  *  o  /ZlT"),  then  from  Figure  2 
and  for  u-  <  R,  the  probability  of  x  <  R  is: 


R 

/  f(J) 


Probability  or  confidence  of 
not  meeting  the  requirement 


The  user's  risk  is: 


(2) 


•  R 

/  f(x)  dx  =  1  -  /  f(x)  dx  (3) 

R  • 


Probability  of  x  meeting  or 
exceeding  the  requirement 


This  approach  of  using  R  itself  as  the  basis  for  pass  or  fail  decisions, 
and  therefore  as  an  integration  limit,  resembles  the  approach  from  the 


contractor's  viewpoint  in  Figure  1  for  a*  0.5.  The  underlying  assumptions  in 
each  approach,  however,  are  different.  In  the  recommended  approach,  the 
user's  risk  will  be  between  0.5  (i.e.,  v-  =  Up  just  below  R)  and  0.0  (i.e., 

=  u  «  R)  when  uj  <  R.  When  u;  *  u_  >  R»  the  contractor's  risk  will  be 
between  0.5  (i.e.,  w-  *  Wp  =  R)  and  0.0  (i.e.,  »  R)*  See  Figure  3. 


For  uc  =  Un  >  R  and  a-  -  a  //N  , 

A  p  A  P 

m 

P  (x  >R}  =  /  f(x)  dx  (4) 

R 

Probability  of  meeting  or 
exceeding  the  requirement 

R 

Contractor's  Risk  =  /  f(x)  dx  (5) 


When  pass/fail  criteria  are  not  completely  specified  in  a  specification, 
the  hypothesis  test  approach  to  decision  making  usually  puts  the  system  user 
at  a  severe  disadvantage  with  respect  to  risk,  since  o  is  usually  chosen 
arbitrarily  small.  If  the  hypothesis  test  technique  is  applied  using  the 
contractor's  assumption  that  Up  =  R,  then  the  corresponding  user's  risk 
[equation  (1)]  and  the  probability  of  not  meeting  the  requirement  [equation 
(2)]  should  be  quantified  as  well  when  1*5^  <R« 
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5.  HYPOTHESIS  TEST  FOR  COMPARING  TWO  SAMPLE  MEANS 


Two  sets  of  data  for  two  independent  random  variables  are  often  compared 
to  decide  whether  their  sample  means  the  same  or  whether  there  is  a  trend. 
Hypothesis  testing  is  often  used  to  make  this  decision.  A  sample  distribution 
of  sample  means  may  also  be  compared  with  the  contractor's  assumed  distribu¬ 
tion  of  Figure  1  where  the  variances  are  equal. 

Null  Hypothesis:  u_  =  u_ 

P 1  P2 

Alternate  Hypothesis:  u  *  u 

P 1  P2 


if  |wr  -  W7  I  <  |R  then  accept  the  null  hypothesis;  otherwise,  reject  it 

Xj  Xj  % 

and  accept  the  alternate  hypothesis.  As  before,  the  choice  of  critical  region 
size  is  entirely  arbitrary,  since  it  is  not  usually  contained  in  a  system 
specification.  Hence,  a  more  direct  approach  for  comparing  the  two  random 
variables  is  desired. 
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6.  FIRST  ALTERNATIVE  FOR  COMPARING  TWO  SAMPLE  DISTRIBUTIONS 

OF  SAMPLE  MEANS 


From  Figure  5,  a  measure  of  the  dissimilarity  of  f(x,)  and  f(x5),  where  y-  > 

*  •  X  « 


wr  ,  Ts: 
^2 


/  f  (Xj  -  X2)  d(Xj  -  X2)  (6) 

0 

The  above  equation  provides  the  probability  that  x^  >  X2.  The  risk  in  this 
conclusion  is; 

0 

/  ^(xi  -  X2)  -  ^2)  =  1  -  /  ^(^1  -  ^2)  -  ^2)  (7) 

0 


Figure  5 


7.  SECOND  ALTERNATIVE  FOR  COMPARING  TWO  SAMPLE  DISTRIBUTIONS  OF  SAMPLE  MEANS 
USING  A  MAXIMUM  LIKELIHOOD  ESTIMATE  TECHNIQUE. 

Another  method  for  comparing  two  random  variables  involves  finding  a  real 
value  c  such  that  the  joint  probability  that  x^  >  c  and  Xj  <  c  is  maximun. 
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The  maximized  joint  probability  then  becomes  a  measure  of  the  similarity  or 
dissimilarity  of  Xj  and  X2.  The  maximized  joint  probability  will  typically 
range  from  0.25  when  y-  =  u-  through  1.0  when  jjh  »  m-  . 

I  f('x 


Figure  6 


Find  the  value  of  c  such  that  P{xj  >  c,  X2  <  c}  is  maximun 


P{A}  =  P(Xi>c)  =  /  f^(xjdx^  =  1-F 


P{B}  =  P(X2  <  C)  =  /f2(X2)dX2  =  F2( 


P{AB}  =  P{A}  •  P{B} 


(i.e.,  A  and  B  are  independent) 


Let  X  be  a  binary  random  variable  where: 


p  =  P{AB} 


q  =  1  -  p 


=  P  fAB  +  AB  +  AB} 


This  equation  is  solved  for  c  which  maximizes  p. 

The  second  derivative  of  p  with  respect  to  c  is  found  to  determine  whether 
a  relative  maximun  or  minimun  is  obtained. 

d2p  df,{c)  +•  '  df,(c)  c 

-  =  — -  •  /  fi(5;i)  dx^  -  2  fi{c)  .  f2(c) - /  f  2(X2)  <1^2  (12) 

dc2  dc  c  dc  -» 

c  is  substituted  into  this  equation  to  confirm  a  maximun  if  (d^p/dc^)  <  0,  or 
a  minimum  if  (d^p/dc^)  >  0.  Note  that  in  equation  (12)  the  integrals,  f2(c) 
and  f2(c)  are  always  positive. 
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8.  EXAMPLE  1  FOR  COMPARING  TWO  SAMPLE  DISTRIBUTIONS  OF  SAMPLE  MEANS 


(bj  -  c)  (c  -  82) 

(bj  -  a^)  (bj  -  aj)' 
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9.  EXAMPLE  2  FOR  COMPARING  TWO  SAMPLE  DISTRIBUTIONS  OF  SAMPLE  MEANS 


2  2 


f  and  f2(x2)  are  isosceles. 


For  =  h2  and  dj  <  a^  <62*  Find  c. 

c  b  2 

fi(c)  •  I  •  /  Fi(xi)  dXi 

+  b2  Ux  +  wx 
’  2  '  2 

(d2p/dc2)  <  0  from  inspection  of  equation  12.  This  value  of  c  maximizes 
p{xj>c,  X2^c}  =  P{xi  >  c}  •  P{x2  <  c} 


■  ,  f i(c)  (c  -  a^) 

1 

h  (c  -  a^)  2 

L  2  J 

■  2 

■  ,  f2(c)  (b2  -  C)  ■ 

2 

h  (b  2  -  c)  2 

L  ^  J 

[  ■  2  (I’a-  1!,)J 

since  fjc)  =  f2{c)  and  (c  -  a^)  =  (b2  -  c) . 


2 
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10.  EXAMPLE  3  FOR  COMPARING  TWO  SAMPLE  DISTRIBUTIONS  OF  SAMPLE  MEANS 


c 


Figure  9 


fj  (x^)  =  ^  ®  Gaussian  power  density  function, 

fj  (Xj)  =  G 

For  <7=  *  <^5  f  solve  for  c: 

Xi  Xj 

c  +* 

^i(c)  •  /  f 2  (X2)  dXj  =  f2(c)  •  J  ^"1  (Xj)  dXj 

-«  c 


c  =  — ^ - -  maximizes  P{xj>c,  X2<c}  =  P{xj>c}*P{x2<c}, 

since  (d^p/dc^)  <  0  from  inspection  of  equation  12. 


•  c 

p{x^  >  c,  X2  <  cj  =  /  fj  (xj)  dx^  •  /  f2  (Xj)  dXj 

c  -* 


For  a-  *  a-  ,  a  trial -and- error  method  may  be  used  to  estimate  c  to  the 
accuracy  desired  such  that  equation  11  is  satisfied.  Once  c  is  known,  the 
values  of  P  {xj  >  c}.  P  {xj  <  c  },  and  P{xi>c,  x 2  <  c }  may  be  calculated. 


11.  SUMMARY 


This  paper  reviews  the  limitations  of  hypothesis  testing  in  that  the 
system  specification  requirements  may  be  compromised  and  the  system  user  may 
be  required  to  take  an  inordinate  risk.  It  provides  an  analytical  procedure 
that  permits  direct  comparison  of  data  with  specification  requirements,  the 
probability  associated  with  a  conclusion,  and  the  risk  in  making  the  conclu¬ 
sion  where  the  risk  is  more  equitably  distributed  between  the  contractor  and 
the  system  user.  This  technique  is  developed  further  to  include  comparison  of 
two  distributions  of  sample  means.  The  latter  includes  a  maximun  likelihood 
estimate  of  the  joint  probability  that  one  random  variable  is  greater  than 
some  value  c  and  the  second  random  variable  less  than  c,  and  the  risk  associ¬ 
ated  with  the  conclusion. 


12.  ACKNOWLEDGEMENT 

I  wish  to  thank  the  Army  Materiel  Test  and  Evaluation  Directorate,  White 
Sands  Missile  Range,  New  Mexico,  and  the  Hawk  Project  Office,  Redstone 
Arsenal,  Alabama,  for  the  opportunity  and  encouragement  to  document  this  anal¬ 
ysis  and  present  it  at  this  conference. 


New  Sequential  and  Parallel  Methods  for  Unconstrained  Optimization  ^ 

Robert  B.  Schnabel 
Department  of  Computer  Science 
University  of  Colorado 
Boulder,  Colorado  80309 

Extended  Abstract.  We  give  an  overview  of  our  recent  research  on  several  topics  in  uncon¬ 
strained  optimization.  First  we  discuss  methods  for  "orthogonal  distance  regression",  data  fitting  when 
the  meastue  of  the  distance  from  the  data  points  to  the  fitted  curve  is  the  Euclidean  distance  rather  than 
the  standard  vertical  distance.  We  describe  an  efficient  algorithm  for  this  problem,  and  experience  with 
its  use.  Second  we  summarize  our  research  into  "tensor  methods"  for  nonlinear  equations  and  optimiza¬ 
tion.  These  methods  use  higher  order  Taylor  series  information  in  a  way  that  appears  to  significantly 
improve  the  performance  of  standard  methods  without  significantly  adding  to  dieir  storage  requirements 
or  arithmetic  cost  per  iteration.  Finally  we  describe  our  research  into  parallel  methods  for  optimization 
problems. 

One  of  the  most  widely  used  methodologies  in  scientific  and  engineering  research  is  the  fitting  of 
equations  to  data  by  least  squares.  In  cases  where  significant  observation  errors  exist  in  all  data  variables, 
however,  the  ordinary  least  squares  approach,  where  all  errors  are  attributed  to  the  observation  variable,  is 
often  inappropriate.  An  alternative  approach,  suggested  by  several  researchers,  involves  minimizing  the 
sum  of  squared  orthogonal  distances  between  each  data  point  and  the  curve  described  by  the  model  equa¬ 
tion.  We  refer  to  this  as  orthogonal  distance  data  fitting.  We  have  developed  a  method  for  solving  the 
orthogonal  distance  fitting  problem  that  is  a  direa  analog  of  the  trust  region  Levenberg-Marquardt  algo¬ 
rithm.  The  number  of  unknowns  involved  is  the  number  of  model  parameters  plus  the  number  of  data 
points,  often  a  very  large  number.  By  exploiting  sparcity,  however,  our  algorithm  has  a  computational 
effort  per  step  which  is  of  the  same  order  as  required  for  the  Levenberg-Marquardt  method  for  ordinary 
least  squares.  The  description  of  this  algorithm,  an  analysis  of  its  mathematical  properties,  and  the  results 
of  computational  tests  on  some  examples  that  illustrate  some  differences  between  the  two  approaches  are 
given  in  Boggs,  Byrd,  and  Schnabel  [1987].  A  software  package  that  implements  this  approach  is 
described  in  Boggs,  Byrd,  Donaldson,  and  Schnabel  [1987],  and  is  available  from  the  authors. 

Tensor  methods  are  a  new  class  of  methods  for  solving  systems  of  nonlinear  equations  and  imcon- 
strained  optimization  problems.  Standard  methods  for  nonlinear  equations  are  related  to  Newton’s 
method,  and  use  a  linear  model  of  the  nonlinear  functions  at  each  iteration.  While  they  are  effective  on 
most  problems,  they  are  slow  if  the  first  derivative  matrix  at  the  root  is  singular  or  nearly  singular.  Ten¬ 
sor  methods  augment  the  standard  linear  model  with  a  simple,  low  rank  second  order  term,  in  a  way  that 
makes  the  method  require  no  more  function  and  derivative  evaluations  per  iteration,  and  hardly  more 
storage  or  arithmetic  operations  per  iteration,  than  standard  methods.  In  our  tests,  tensor  methods  are 
significantly  more  efficient  than  standard  methods  on  both  nonsingular  and  singular  problems.  This 
research  is  described  in  Schnabel  and  Frank  [1984,  1987].  More  recently  we  have  developed  tensor 
methods  for  unconstrained  optimization.  These  methods  augment  the  quadratic  model,  upon  which  stan¬ 
dard  optimization  methods  are  based,  with  low  rank  third  and  fourth  order  terms.  Again,  the  costs  per 
iteration  of  the  tensor  method  are  hardly  more  than  for  the  standard  method,  and  the  method  requires  sub¬ 
stantially  fewer  total  iterations  and  function  evaluations  in  our  tests.  This  research  is  described  in 
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Schnabel  and  Chow  [1988]. 

Parallel  optimization  research  at  the  University  of  Colorado  has  focused  upon  designing  and  imple¬ 
menting  parallel  algorithms  for  two  optimization  problems.  One  of  these  is  the  global  optimization  prob¬ 
lem,  which  is  to  find  the  lowest  minimizer  of  a  nonlii^ar  function  of  multiple  variables  that  has  multiple 
local  minimizers.  We  have  developed  two  types  of  parallel  algorithms  for  this  problem,  both  based  on 
the  stochastic  approach  of  Rinnooy  Kan  and  co-woiiters.  The  first  is  a  rather  straightforward  paralleliza¬ 
tion  of  the  sequential  algorithm,  while  the  second  is  a  new,  adaptive,  dynamic  method  that  is  suggested 
by  considerations  of  parallelism.  Some  of  this  research  is  described  in  Byrd,  Deit,  Rinnooy  Kan,  and 
Schnabel  [1986].  The  second  optimization  problem  we  have  investigated  is  the  standard  local  uncon¬ 
strained  optimization  problem.  We  have  studied  new  parallel  optimization  algorithms  that  use  speculative 
function  evaluations  to  evaluate  part,  but  not  all,  of  the  Hessian  matrix  at  each  iteration.  We  have  also 
analyzed  alternatives  for  parallelizing  the  matrix  updating  calculations  that  constitute  the  main  linear 
algebra  cost  of  these  methods.  This  research  is  described  in  Schnabel  [1987]  and  Byrd,  Schnabel,  and 
Schultz  [1988]. 
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1  Abstract 


In  studying  the  teftaction  of  the  helicoptet  sound  field  from  a  sheai  layet,  we  face 
the  pioblem  of  interaction  of  the  sound  field  with  another  body  (shear  layer).  In 
this  interaction,  we  need  the  induced  velocity  in  addition  to  the  pressure  since 
the  boundary  condition  at  the  foreign  body  (shear  layer)  surface  is  with  respect 
to  the  normal  velocity.  Therefore,  a  formula  in  terms  of  the  sound  pressure  only 
is  not  sufficient.  We  need  both  pressure  and  velocity  expressions  so  that  we  can 
invoke  the  interface  conditions  (continuity  of  the  pressure  and  continuity  of  the 
normal  velocity). 

We  are,  therefore,  motivated  to  find  two  equations  in  terms  of  two  acoustic 
fields  ;  pressure  fluctuation  and  velocity  fluctuation. 

In  this  paper,  by  defining  two  generalized  functions  ,  we  develop  an  approach 
which  yields  two  field  equations.  We  suggest  to  use  these  two  equations  in  any 
interaction  problem  of  the  helicopter  sound  field  and  in  particular,  in  studying 
the  refraction  from  a  shear  layer  for  all  frequency  ranges. 

It  is  also  found  that  the  spectral  methods  seem  to  be  more  efficient  in  re¬ 
fraction  problems. 

2  Introduction 

We  present  an  alternate  analytical  description  of  the  acoustic  field  of  a  moving 
body  in  a  uniform  flow. 
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Instead  of  using  Ffowcs  Williams-  Hawkings  [1]  version  of  acoustic  analogy, 
we  formulate  sources  on  a  surface  enclosing  the  moving  body  and  its  adjacent 
nonlinear  flow  field. 

This  approach  avoids  the  laborious  work  of  quadrupole  terms  and  can  be 
considered  as  a  generalisation  of  the  Kirchhoff-Helmholts  theorem  of  su;oustics. 

In  helicopter  acoustics  community,  it  has  become  a  tradition  to  take  FW- 
H  extension  of  Lighthill’s  acoustic  analogy  concept  [2]  as  the  starting  point. 
In  this  general  formulation  the  acoustic  field  of  a  body,  moving  in  a  locally 
nonuniform,  unsteady  flow  field,  is  expressed  in  terms  of  a  monopole  and  a  dipole 
source  distribution  over  the  body  surface  and  a  quadrupole  source  distribution 
over  the  volume  containing  the  non-uniform,  unsteady  field  in  which  the  body 
moves. 

Here  the  quadrupole  source  terms  correspond  to  the  nonlinearities  in  the 
flow  equations. 

At  large  distance  the  medium  is  at  rest,  apart  from  perturbations  of  acoustic 
order.  Thus,  in  order  to  evaluate  the  quadrupole  source  terms,  it  is  in  principle 
necessary  to  know  the  complete  flow  field  external  to  the  body  in  advance. 

3  An  Alternate  Formulation 

Our  alternate  formulation  is  as  follows  : 

The  acoustic  field  is  described  in  terms  of  the  flow  variables  at  the  outer 
boundary  of  the  volume  containing  the  quadrupole  distribution.  In  this  sense 
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we  can  call  it  a  geneialisation  of  the  KitchoiF-HeliDholtz  theoiem  [3].  Some  of 
the  salient  featutes  of  out  formulation  ate  : 


•  a  surface  integral  has  to  be  evaluated  instead  of  a  volume  integral. 

•  laborious  calculation  of  the  complicated  quadrupole  terms  are  avoided. 

•  we  have  expressions  for  both  pressure  and  velocity  fluctuations  to  be  used 
in  solving  interaction  problems  of  the  sound  fleld  with  other  bodies. 

We  consider  a  uniform  flow  in  the  fluid  since  we  are  interested  in  uniform 
forward  motions  of  the  body.  But  for  an  arbitrary  motion  of  the  body,  an 
arbitrary  flow  can  be  taken  and  the  method  allows  this.  In  a  uniform  flow,  we 
show  that  not  only  an  acoustic  field  is  generated  at  the  boundary  but  also  a 
hydrodynamic,  vortical  velocity  field  naturally  emerges. 

4  What  b  the  relation  to  FW-H  ? 

The  two  methods  do  coincide  when  the  induced  velocity  perturbations  are  small 
which  in  turn  implies  that  quadrupole  field  is  relatively  weak  and  therefore  can 
be  neglected.  The  methods  converge  to  each  other  since  volume  sources  in  FW- 
H  vanish  and  in  out  formulation,  the  source  surface  shrinks  to  the  actual  body 
(blade)  surface. 

For  conditions  with  a  non-negligible  quadrupole  source  fleld,  it  is  straightfor¬ 
ward  to  apply  the  present  method  with  source  surfaces  (5  =  0)  at  some  distance 
from  the  blade  surface  provided  that  the  aerodynamic  field  is  given. 
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Conttaiy  to  this  situation,  the  inclasion  of  the  quadiupole  source  srength  in 
a  FW-H  implies  considerable  analytical  and  numerical  efforts. 

5  Governing  Equations 

Let  5(x,t)  =  0  describe  a  surface  enclosing  a  body  moving  in  a  flow  such  that 
outside  5  =  0,  the  field  is,  to  leading  order,  governed  by  the  linearised  flow 
equations  for  the  pressure  and  velocity  fluctuations  induced  by  the  body  (p,  v). 
We  denote  by  5  <  0  the  inside  of  the  surface  and  by  5  >  0  the  outside  of  the 
surface. 

We  can  make  the  linearised  flow  equations  formally  valid  throughout  the 
space  by  multiplying  them  with  H(5),  the  Heaviside  function  of  5[4,5]. 


where 


^(5)[3^P+Vov]  =  0 


^(5)l^v  +  Vp]  =  0 


D  5  „  d 

Dt  ~  dt'^^ dz 


(1) 

(2) 

(3) 


We  nondimensionalise  the  equations  by  using  a  characteristic  length,  the  mass 
density  and  the  speed  of  sound  in  the  fluid  at  infinity. 

We  have  the  following  equations  for  the  generalised  pressure  and  generalized 
velocity  outside  5  =  0. 


^[pfl'(^)]  +  Vo[vF(5)]  =  (?tf(5) 


(4) 
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^[vF(S)]  +  V[pF(S)]  =  FS{S) 


where 


DS 

Q  =  +  V  o  V5 

^  ZX 

«  DS 
F  =  v— +  pVS 


(5) 


(7) 


EliminatioD  of  vH{S)  results  in  convected  wave  equation  for  the  sound  pres¬ 
sure  field  outside  5  =  0,  driven  by  a  surface  source  distribution  at  5  =  0. 


(V^  -  ^  +  V  X  Vx)[vF(S)]  =  VQ6{S)  -  ^P6(5)  (9) 

Green’s  function,  which  is  defined  as  the  acoustic  field  of  an  impulsive  point 
source,  has  to  satisfy 


hence  the  generalized  pressure  fluctuation  becomes 

pB(S)  =  JJf  J[^oG  o  P  -  ^Q]6(S)d^dT  (11) 

what  we  have  here  is  the  expression  for  the  acoustic  pressure  of  a  source  region 
in  a  uniform  mean  flow. 

Note  that  we  did  not  assume  sm  irrotational  velocity  fluctuation 
field. 
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6  Greenes  Function  Formalism 


Gieen’s  functions  ate  very  instrumental  in  bringing  in  othei  boundaries  in  the 
fluid  (such  as  wind  tunnel  walls,  shear  layers  or  other  foreign  bodies).  There 
are  two  ways  to  introduce  the  Green’s  function:  the  traditional  way  and  the  less 
traditional  spectral  way.  We  think  the  spectral  way  is  more  advantageous  as  it 
will  be  shown  later. 


•  The  traditional  Green’s  function  : 


g  =  s 


{T-R) 


(12) 


where 

T  =  /3(t-r)  +  Mi^^  (13) 

=  (1  -  (14) 

and  R  is  the  distance  between  the  source  and  the  observer.  This  form  of 
the  Green’s  function  is  the  most  appealing  as  it  clearly  describes  the  spher¬ 
ical  propagation  of  an  acoustic  pulse  modifled  by  mean  flow  convection. 
In  the  literature,  the  use  of  this  form  resulted  in  time-domain  methods  [6]. 

Farassat’s  method  [5]  is  a  modern  version  of  time-domain  method. 


•  The  spectral  Green’s  function  :  If  we  take  a  Fourier- Hankel  transform,  we 
obtain  a  frequency  domain  result.  In  the  literature,  the  use  of  this  form 
resulted  in  frequency  methods  [6]. 


foo  foo  foa 

/  /  /  / 

«/  — oo*— oo«/0 


(15) 
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In  wave  numbet  space,  oni  alternate  form  of  Green’s  function  becomes 


G{a,y,u>  I  €,p,^,r)  =  ^-♦(nWKat).  Jn(yp) 

7*  +  a*  -  (w  +  May 

where  {  is  the  axial  source  coordinate,  p  is  the  radial  source  coordinate,  r 
is  the  source  time,  ^  is  the  angular  source  coordinate. 


The  advantage  of  spectral  Green’s  function  is  that  the  final  solution  of  the 
acoustic  field  automatically  yields  an  expansion  in  time-space  harmonics. 

It  is  also  important  to  notice  that  since  our  approach  includes  the  possi¬ 
bility  of  a  non-uniform  incident  field  (  and  since  sound  field  is  significantly 
affected  by  assymmetries  in  the  flow)  we  should  be  able  to  describe  the 
acoustic  field  well. 


7  An  Application  :  Refraction  from  a  Shear 
Layer 

To  compute  the  interaction  of  the  sound  field  with  another  body,  we  need 
the  induced  velocity  in  addition  to  the  pressure  since  the  boundary  con¬ 
dition  at  the  foreign  body  surface  is  w.r.t.  the  normal  velocity. 

We  shall  use  spectral  analysis  to  obtain  an  expression  for  the  generalised 
velocity  fluctuations. 

-/r  €-‘(‘'*+">[vfi-(5)]dxdt  (17) 
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and  similarly 


P  =  f  P  e-*<“‘+'“)[pH(5)]dxd< 

(18) 

t  =  j  p  e-‘("‘+")P6(S) 

(19) 

Let  us  take  the  Fourier  transform  of  the  momentum  equation  in  time  and 

axial  direction, 

^[pH(S)]-|-Vo(vF(5)]  =  Q5(S) 

(20) 

^[vJ(S)]  -H  V[pH(5)]  =  Fi(S) 

(21) 

v>  = 

■(  J.TLr  (*«i«+ir;j 

t{w  +  Ma)  dr  rod 

(22) 

1  do 

»(w  +  Ma)  ^  ^  rQ0)  PQ 

(23) 

We  should  observe  here  that  the  velocity  is  exclusively  in  (5  =  0)  surface 
quantities. 

The  velocity  in  physical  space  and  time  can  be  obtained  by  inverse  trans¬ 
forming  ; 

=  (2^^  /  /I  (24) 

I  r„  '*')**'**'  (25) 

We  can  interpret  this  equation  as  a  generalised  theorem  for  the  generalized 
velocity  fluctuations. 

Shear  layer  refraction  problem  is  reduced  to  a  boundary-value  problem 
of  the  governing  equations  with  additional  boundary  conditions  coming 
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fitom  the  sheai  layei  model.  We  have  intioduced  font  diifeient  shear  layer 
model  [Unal,  Tang,  7]  the  simplest  one  is  a  vortex  sheet.  The  boundary 
conditions  on  the  vortex  sheet  are  : 

-  Continuity  of  the  normal  velocity 

-  Continuity  of  the  pressure 

We  end  our  paper  by  stating  that  the  shear  layer  re&action  of  the  heli¬ 
copter  sound  is  a  boundary-value  problem  defined  on  our  two  field  equa¬ 
tions. 

We  intend  to  solve  this  boundary-value  problem  for  different  configura¬ 
tions  and  conditions  numerically. 
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INVERSE  SOURCE  MODELING  IN  HELICOPTER  ACOUSTICS 


A.  Unal^ 


and  C.  Tung 


2 


1  Abstract 

In  OUT  efforts  to  compote  the  sound  created  by  a  moving  body  (helicopter),  we 
face  the  evaluation  of  the  neat  field  either  numerically  or  experimentally  which 
are  both  difficult  tasks  and  could  easily  lack  precision. 

To  circumvent  these  difficulties  we  recast  the  governing  equations  into  non¬ 
linear  integral  equations  and  define  the  helicopter  source  sound  characterisation 
as  an  inverse  problem  using  the  far  field  computations  or  measurements. 

The  logic  of  the  approach  lies  in  the  fact  that  it  is  both  easier  to  compute 
or  to  measure  the  far  field. 

Like  any  other  inverse  problem,  the  helicopter  source  characterisation  also 
faces  the  problem  of  multiple  solutions. 

We  claim  and  later  demonstrate  that  to  the  benefit  of  the  helicopter  research, 
the  form  of  the  kernel  of  the  integral  equation  eliminates  the  problem  of  non- 
uniqueness. 

Hence,  we  can  use  the  inverse  source  modeling  concept  to  obtain  an  equiv¬ 
alent  source  characterisation  using  the  far  field  data  and  then  propagating  the 
fields  according  to  the  governing  equations  for  evaluation  of  the  acoustic  pres¬ 
sure  fluctuations  at  an  arbitrary  observer. 


‘Dept,  of  Aero,  and  Astro.,  Stanford  University,  Stanford,  California,  94305. 
Research  scientist,  US  Army  Aviation  Research  and  Technology  Activity  Moffett 
Field,  CA  94035. 
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2  Mathematical  Analysis 


We  have  foimulated  the  helicoptei  acoustics  problem  in  terms  of  two  generalized 
functions  and  two  field  equations  in  [l]. 

These  equations  are  : 

V’  -  ;^]  = 

DS  D  DS 

Vo[v— +pVS]6(S)-— [p— +voVS]6(S)  (1) 

n2  ne 

V'-  — +VxVx  [v^(5)]  =  V[p— +  V  c  VS]i(S) 

Once  we  have  the  governing  field  equations  in  hand,  we  can  proceed  via  two 
approaches  ; 

•  Approach  I  :  Inject  the  numerically  simulated  pressure  and  velocity  fields 
on  a  chosen  surface  (a  computational  surface  or  an  airfoil)  into  the  right- 
hand-side  source  terms  and  solve  the  partial  differential  equations  numer¬ 
ically. 

•  Approach  II  :  Use  the  far  field  data  and  the  concept  of  inverse  source 
modeling  to  replace  the  right- hand-side  of  the  field  equations  by  equivalent 
sources  then  propagate  the  fields  to  an  arbitrary  observer  location  through 
the  equations. 

The  difference  between  these  two  approaches  lies  in  whether  we  are  using 
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near  field  data  (for  the  first  approach)  or  far  field  data  (for  the  second 
approach). 

Since  it  is  always  difficult  to  take  near-field  measurements  pre¬ 
cisely  and  it  is  always  more  costly  to  compute  the  near-Held  precisely 
compared  to  far  field  measurements  or  far  field  computations,  there 
is  a  good  reason  to  introduce  the  inverse  source  modeling  notions  in 
helicopter  acoustics. 

Our  approach  will  consist  of  two  parts  : 

•  Part  I  :  Solve  an  inverse  problem. 

•  Part  II  :  Use  the  inverse  source  modeling  of  Part  I.  solve  the 
direct  problem. 

3  Statement  of  the  Problem 

What  is  an  equivalent  source  for  our  helicopter  in  its  arbitrary  motion 
and  what  about  its  uniqueness? 

4  Mathematical  Analysis 

Let  us  say  that  the  bources.  are  distrihnted  within  a  region  5  of  space 
with  intensity  Q(r,t)  and  let  us  say  that  we  have  the  governing  equa- 
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tion  (1)  for  what  we  named  a  generalized  acoustic  pressure  fluctuation 

^^-^]\pH(S)]  =  Q(T,t)  (3) 

which  is  an  inhomogeneous  converted  scalar  wave  equation. 

Let  us  say  that  we  have  an  unbounded  fluid  medium  with  zero 
initial  conditions,  namely  ; 


pH{S)=^lpH{S)] 

(4) 

=  0 

(5) 

for  <  <  0. 

Using  the  Green's  function  formalism  and  generalized  function 
theory  [2],,  we  can  write  down  the  solution  as 

G{r,Up,t')  =  -6{r-p)6(t-t')  (6) 

thus 

pif(5)=-  I  df  j  dpG{T,t;p,t')Q(p,t')  (7) 

Here  the  integrations  are  taken  over  the  entire  space-time  domain. 
The  causality  property  of  G  precludes  integration  over  times  t'  that 
are  later  than  i,  while  the  spatial  integration  is  extended  only  through 
the  region  5  where  C?(r,l)  is  nonzero. 

Let  us  interpret  the  integral  equation  : 

•  If  Q{p,  t')  is  known,  then  we  solve  the  direct  problem  and  deter¬ 
mine  pH(S). 


•  If,  however,  Q(p,  t')  is  the  unknown  quantity,  then  the  integral 
equation  is  a  linear  integral  equation  for  Q(p,i')  with  kernel  G. 

For  any  finite  source  distribution  Q,  the  wave  function  pH(S)  must 
fall  off  sufficiently  rapidly  as  r  approaches  infinity. 

pH(S)  =  - j  dt' jdpG{T,t-,p,i')Qi^  (8) 

1  2 
(1)  is  a  known  quantity  and  (2)  is  an  unknown  quantity. 

The  integral  equation  is  a  Fredholm  equation  of  the  first  kind. 
Usually  the  solution  of  this  type  of  equation  is  not  a  simple  matter 
but  in  this  case  the  kernel  C?  is  a  Green's  function.  For  such  kernels,  it 
is  well  known  that  a  solution  can  be  obtained  if  the  Green's  function 
can  be  expanded  in  a  series  of  orthonormal  functions. 

However,  there  remains  a  problem  of  nonuniqueness  which  we  shall 
discuss  thoroughly  under  the  section  of  nonuniqueness. 

•  We  can  consider  the  integral  equation  as  a  description  of  pH{S)  : 
pH(S)  is  the  integral  transform  of  Q{p,  t')  with  kernel  G.  Then  the 
inverse  transform  will  yield  Q  in  terms  of  pH(S).  This  will  work 
in  cases  where  G  has  the  form  of  known  integral  transforms. 

Let  us  pose  the  problem  as  follows  :  If  we  know  the  far  field 
values  of  pH(S)  at  a  finite  number  of  locations,  can  we  solve  for 
<?? 

Answer  :  Yes,  if  we  discretize  the  integrations  as  well  as  truncate 
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any  resulting  infinite  sums,  and  if  we  overcome  the  nonunique¬ 
ness. 

From  a  physical  point  of  view,  it  is  clear  that  in  order  to  obtain 
Q  from  the  inverse  transformation,  it  is  necessary  to  know  pH{S) 
over  the  entire  domain  of  space  and  time.  This  seems  to  be  an 
impossible  problem  at  first  sight  but  as  we  shall  see  it  is  possible 
to  attack  this  problem  as  follows  : 


1.  To  invert  the  integral  equation,  solving  for  Q,  we  must  sat¬ 
isfy  the  Green’s  function  G.  The  form  of  G  depends  on 
the  properties  of  the  unbounded  medium  that  supports  the 
acoustic  waves  propagating  away  from  the  source  region  S. 
For  scalar  waves,  the  pH{S)  profile  appearing  in  the  govern¬ 
ing  equation  varies  in  time  and  space,  in  which  case,  it  is 
generally  difficult  to  solve  explicitly  for  G.  But  if  the  ve¬ 
locity  of  sound  (c)  is  constant,  then  the  Green's  function  is 
well-known: 


G{T.i,p,i') 


1  1^] 

4x  I  r  -  p  ! 


(9) 


We  can  change  the  convected  wave  operator  into  Helmholtz 
operator  by  using  a  transformation  of  moving  coordinates. 
Let  us  then  follow  the  analysis  on  this  transformed  equation. 
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The  Helmholtz  equation  is  given  by  : 


[V^  +  k^]pH(r,w)  =  Q(T,w)  (10) 

which  is  a  Fourier  transform  of  the  original  equation  with 
respect  to  time.  Here,  k  =  ^.  The  Green’s  function  for  this 
case  is 

1  ..t|r-r.| 

G(>-,p)  =  (^)T7- ^  (11) 

4ff  I  r  -  p  I 

The  equivalent  integral  equation  is  then 

1  f 

pHiS){T,w)  =  -—  dp- - -.Qip,w)  (12) 

At  J  I  r  -  p  1 

The  most  crucial  observation  here  b  that  the  relation  be¬ 
tween  Q(p,w)  and  pH(S){r,w)  is  linear.  This,  in  turn  means 
that  we  can  use  various  approximate  methods  to  solve  for 
Q  in  terms  of  pH{S)(t,  io).  The  position  vector  r  b  entbely 
arbitrary.  Hence,  if  pII{S)  is  known  or  measured  at  a  suffi¬ 
ciently  large  number  of  arbitrary  locations,  an  attempt  can 
be  made  to  solve  for  Q.  In  practice  only  two  possibilities 
are  of  interest  : 

(a)  when  pH{S]  is  measured  in  the  proximity  of  the  heli¬ 
copter, 

(b)  when  pH{S)  is  measured  far  away  from  the  helicopter, 
the  far  Held  case. 
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We  thkll  eonccntrfttc  on  tbe  Mcond  case.  Wmt  fteld  aikuinp- 
tioD  implies  immediately  that 

r=|rl>lp«.,  1  (13) 

in  the  support  of  Q  i.e.  in  the  region  where  Q  is  nonaero  : 
namely 

^  (14) 

If  we  use  the  inequality  to  expand  the  kernel, 

[p//(5));.f  (r  k)=  I dp<  '^^Q((>.V)  (16) 

hut  the  integrand  is  nothing  other  than  the  Fourier  Trans* 
form  of  the  source  distribution.  The  point  in  the  trans¬ 
formed  space  being  given  by  the  value.«  of  the  vector 

k=fc^  (16) 

With  the  foregoing  interpretation,  it  would  seem  a  simple 
matter  of  taking  the  inverse  Fourier  transform  and  thus  solv¬ 
ing  explicitly  for  Q(r).  But  actually,  the  situation  is  some¬ 
what  more  complicated  since  tbe  wave  vector  k;,  which 
of  necessity  becomes  tbe  integration  variable  in  the  three- 
dimensional  integral  defining  the  inverse  Fourier  transform, 
is  limited  by  the  condition 

lkl=:k'l  (17) 
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=  k 

(18) 

w 

(19) 

Let  us  denote  the  Fourier  transform  of  Q  by  Q{K,  k),  then 

Q(K.k)  =  j  dpe'^^OQip.^i)  (20) 

The  restriction  on  (kp)  vector  can  be  accounted  for  by  defin¬ 
ing  a  generalized  source  (which  is  also  an  effective  source) 
that  is  given  in  Fourier  space  by 

Q,  =  (?(K,k)6(A'-k)  (21) 

Now,  Qt  is  defined  for  all  possible  K.  This  is  indeed  the 
definition  of  our  third  generalized  function  which  in  turn 
defines  the  generalized  effective  source.  Note  that  it  might 
be  mathematically  more  proper  if  we  call 

4  =  <?,  (22) 

the  Fourier  transform  of  the  generalized  sonrce  function  or 
the  generalized  equivalent  source  function  in  the  K  space. 
This  multiplication  in  K  space  will  result  in  a  convolution 
in  real  space.  Note  also  that  the  physical  dimensions  of 
the  generalized  effective  source  is  different  from  those  of  Q 
because  the  delta  function  6(K-  k)  has  dimensions  of  length. 
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Inverse  Fourier  transform  yields  ; 


Q*(K,k)  =  (?(K-k) 


Q,(K 

<?.(r,k) 


y  dpe'^^OQ(p,k)  6(K-k) 

-k)  =  j  dpe'^’^»Q(p,k)SiK-k) 


{2xV 


/ 


dKe^°’^Q(K,k)S(K-k) 


-f 

2ir-  J 


dpQ{p,k) 


sink  \  t  —  p 

1  r  -  P  I 


k  =  F(k,9,4>) 


(23) 

(24) 

(25) 

(26) 

(27) 

(28) 


<?.(r.k)  =  -.M^ 

dd)  I  d&sinfi[pff(S)jj!'f'{^,ff,d)e^‘’^  (29) 

Jo 

The  right'hand-side  integrals  represent  the  angular  Fourier 
transform  of  the  values  measured. 


A  =  cT 


(30) 


A  is  the  point  in  the  far  field  where  measurements  are  taken. 
c  is  the  speed  of  propagation  of  sound  waves  and  T  is  the 
travel  time  of  the  wave  from  the  actual  source  to  the  point. 


k  0  r  =  krcosB 


(31) 


where 


r  =  r[sindcos<t>,  sinSsind),  cosd] 


(32) 
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Equation  (29)  can  be  considered  as  an  algorithm  that  yields 
the  effective  source  distribution  at  any  point  r  in  terms  of 
the  measured  values  of  the  pressure  in  the  far  field. 

5  Equivalent  Source  Distribution  at  a  Point 

Let 

Q(r)  =  pfo6(T)  (33) 

Delta  function  will  be  imaged  as  distributions  that  are  smeared 
out  in  space.  The  extent  of  smearing  depends  on  the  reso¬ 
lution  of  the  imaging  process. 

Let  us  write  the  expansion  for  the  point  equivalent  source 
in  terms  of  spherical  harmonics  as  : 


/  =  foPocosO  (34) 

Pq  is  zero-order  Legendre  polynomial. 

Po  =  1  (35) 

/  =  focos0  (36) 

/o  is  the  value  of  the  form  function. 

OO 

Qe^(r)  =  2  ^  i"  fn(k)jn(kr)P„cose  (37) 

n=0 

=  2foJo(kr)  (38) 
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where  jo  is  the  zero-order  spherical  Bessel  function  which  is 
the  sine  function.  Its  first  zero  occurs  at  kro  =  n-  or  in  terms 
of  wave  lengths  of 

ro  =  ^  (39) 

6  Non-Uniqueness 

It  is  now  a  well-established  fact  that  the  Fredholm  integral 
equations  of  the  first  kind  do  not  always  possess  unique 
solutions.  Solutions  depend  on  arbitrary  constants,  which 
must  ultimately  be  determined  from  criteria  not  given  in 
the  original  problem. 

For  certain  kernels,  such  as  Fourier,  Laplace,  Mellin,  and 
others,  it  is  well  known  that  the  appropriate  integral  equa¬ 
tions  have  unique  solutions  with  only  mild  restrictions  im¬ 
posed  on  the  allowed  class  of  functions. 

Effective  source  distribution  in  Fourier  space  is 

Q,(K,k)  =  (?(K,k)  A(K,k)  (40) 

A(K,k)  =  5(lK-k)  (41) 

although  (?(K,k)  was  known  only  on  the  spherical  shell  K  = 
k,  the  choice  of  A,  albeit  arbitrary,  extends  the  definition 
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of  the  function  Q  to  all  points  of  K  space.  This  extension 
modifies  the  integral  equation  into  one  with  an  unrestricted 
Fourier  kernel  and  thus  into  one  with  a  unique  solution. 

In  other  words,  the  introduction  of  the  generalized  func¬ 
tion  (also  named  filter  function)  in  the  integral  equation 
is  a  mathematical  device  that  removes  the  "defect”  in  the 
kernel.  In  the  present  case,  this  defect  is  the  restriction  of 
the  kernel  to  the  shell  K  =  k  which  is  responsible  of  non¬ 
uniqueness.  The  price  paid  for  regaining  uniqueness  is  that 
instead  of  the  original  quantity  of  interest,  a  related  quan¬ 
tity  is  obtained.  The  latter  may  be  viewed  a  filter  version 
of  the  original  quantity  in  the  physical  space. 

J  dpe^^»Q°(p)  (42) 

for  I  K  1=  k.  Choose  a  Q^(p)  such  that  its  Fourier  transform 
vanishes  for  |  K  |=  k. 


Q°(K)  =(K-k)e-“*^ 

(43) 

II 

1 

•st 

fb 

1 

0 

(44) 

These  are  spherically  symmetric  distributions  in  K  space 
which,  in  turn,  yield  spherically  symmetric  distributions  in 
physical  space  given  by 
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inserting  back. 


=  6{t  -  To)  (56) 

which  is  indeed  the  correct  source. 
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I.  Abstract 

A  Multighd  Alternating  Direction  Implicit  Scheme  has  been  developed  to  solve  the 
Euler  equations  of  inviscid,  compressible  flow  in  three  dimensions.  The  scheme 
is  an  extension  of  the  two-dimensional  scheme  developed  by  Caughey  [1]  to  treat 
three-dimensional  problems.  The  multigrid  method  is  an  efficient  technique  for 
accelerating  the  convergence  of  iterative  methods;  the  Alternating  Direction  Im¬ 
plicit  scheme  holds  the  promise  of  rapid  convergence  characteristics,  especially  on 
the  highly-stretched  meshes  required  to  resolve  the  thin  shear  layers  appearing 
in  high  Reynolds  number  flows;  and  the  dlagonalization  procedure  results  in  a 
computationally-efficient  implementation  of  the  ADI  scheme.  The  scheme  is  ap¬ 
plied  to  compute  the  transonic  flow  past  a  swept  wing,  and  results  are  presented 
to  confirm  the  accuracy  of  the  method  and  illustrate  the  efficiency  of  the  iterative 
algorithm. 
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Diagonal  Implicit  Multigrid  Solution 

II.  Analysis 


The  Euler  equations  of  inviscid,  compressible  flow  can  be  written  in  three  space 
dimensions  as 


dwfdt  -K  dfi/dx  •¥df2/dy  -1-  df3jdz  ~  0 

(1) 

where 

W=  {p,pUi,pU2,pU3,e}'^ 

(2o) 

is  the  vector  of  conserved  dependent  variables,  and 

fl  =  {P«l,PWl  +P,pui«2,pwi«3,«i(e +p)}^ 

(26) 

f2  =  {p«2,pW2tii,pW2  +F-'‘’*2«3,ti2(e  +  p)}^ 

(2c) 

fs  =  {PU3,P«3«1,PW3«2,P«3  +P.  “3(e  +  p)}^ 

(2d) 

are  the  flux  vectors  in  the  x,  y,  and  z  coordinates  respectively.  The  pressure  p  is 
related  to  the  total  energy  e  by  the  equation  of  state 

e  =  p/(7-l)  +  p(«i+u^+ti^)/2-p.ffo,  (3) 

where  7  is  the  ratio  of  specific  heats  and  Hq  is  the  total  enthalpy. 

In  order  to  allow  for  the  treatment  of  arbitrary  geometries,  the  algorithm  is 
implemented  within  the  framework  of  a  finite  volume  approximation  [2].  The  equa¬ 
tions  under  an  arbitrary  transformation  of  independent  variables  to  a  new  curvilinear 


coordinate  system  can  be  written  as 

dW/dt  +  dFi/d^  +  dF2ld7j  -I-  dFz/dC:  =  0,  (4) 

where  W  =  hw  is  the  vector  of  transformed  dependent  variables,  and 

Fx  =  h{pUx,puxUx  -f-  ^xP,pu2Ui  +  ^yP,(yu2Ui  +  +  p)}^  (5a) 

fj  =  h{pU2,puxU2  +  T]xP,pu2U2  +  riyp,pu3U2  +  »7*p,C^2(e  -f  p)}^  (56) 

F3  =  h{pU3,pUiU3  +  <^xP,pU2U3  +  (^yP,pU3U3  +  CzP>  ^’3(6 -|-  p)}^  (5c) 


are  the  transformed  flux  vectors.  Here,  h  is  the  determinant  of  the  Jacobian  of  the 
transformation  (which  corresponds  to  the  cell  volume),  and  Ux,U2,  and  U3  are  the 
contravariant  components  of  the  velocity  given  by 

Fn  Kx  f«r 

U2  =  T7x  Vy  Vz  U2  .  (6) 

.  ,  \  Cl  Cy  Cz  J  i  '“3  . 
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Artificial  dissipation  is  added  as  a  blend  of  second  and  fourth  differences  of  the 
solution.  The  fourth  differences  are  necessary  to  insure  convergence  to  a  steady  state, 
while  the  second  differences  are  necessary  to  prevent  excessive  oscillations  of  the  so¬ 
lution  in  the  vicinity  of  shock  waves.  Following  the  two-dimensional  implementation 
of  Caughey  [l],  these  dissipation  terms  are  scaled  to  stabilize  the  one-dimensional 
problems  in  each  coordinate  direction. 

Following  Briley  and  McDonald  [3],  and  Beam  and  Warming  [4],  the  time  lin¬ 
earized  implicit  operator  is  approximated  as  the  product  of  three  one-dimensional 
factors,  resulting  in  a  scheme  of  the  form 

{/+ +(15^«f*(i/k)i} 

{/  +  -  ej l/k)  + 

{I  +  = 

-  At{6^Fu,j,k  +  6r,F2i,j,k  +  i(iFii,j,k 

-  +Sl  +  +  6*  +  (7) 


where  Jacobians  of  the  transformed  flux  vectors 

with  respect  to  the  solution  and  c^^^and  ate  the  dissipation  coefficients.  For 
computational  efficiency,  each  factor  in  £q.  (7)  is  diagonalized  by  a  local  similarity 
transformation,  yielding  a  decoupled  set  of  equations  which  can  be  solved  using 
a  scalar  pentadiagonal  solver.  The  diagonalization  is  performed  using  the  modal 
matrices  of  the  Jacobians  Aj  (I  =  1,2,3).  Thus,  if  Qt  is  the  modal  matrix  of  Ai, 
then  AiQi  =  Aj  is  a  diagonal  matrix  whose  non-zero  elements  are  the  eigenvalues 
of  Ai.  Applying  this  transformation  at  each  mesh  point,  the  resulting  equations  are: 


{/  -I-  - 

Q2i,J,k{^  +  ~ 


(*) 

t,j,k 

(4) 

‘.i.* 

(4) 


-  ^{^iFuj,k  +  6r,F2i,j,k  + 


(8) 


where  ~  ^3,i,j,k~^  ^^ilj,k-  The  elements  oIQi,  and  A(  can  be  expressed 

explicitly  in  terms  of  tU  and  the  elements  of  the  Jacobian  matrix  of  the  coordinate 
transformations,  and  are  given  by  Chaussee  and  Pulliam  [5].  The  solution  of  Eqs.  (8) 
is  performed  sequentially  by  solving  fivescalar-pentadiagonal  systems  along  each  line 
in  each  of  the  three  mesh  directions  for  each  time  step.  The  scheme  is  incorporated 
within  the  multigrid  algorithm  following  the  procedure  developed  by  Jameson  [6]. 
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The  treatment  of  the  explicit  boundary  conditions  in  the  far  held  is  based  on 
the  Riemann  invariants  of  the  one-dimensional  problem  normal  to  the  boundary.  On 
the  wing  sur&ce  and  on  the  symmetry  plane,  the  pressure  is  interpolated  from  the 
interior  of  the  held  using  the  normal  momentum  equation.  The  implicit  boundary 
conditions  are  treated  in  a  manner  consistent  with  the  characteristic  theory;  this 
is  relatively  easy  to  implement  since  the  corrections  (or  intermediate  corrections) 
determined  in  each  step  are  approximations  to  the  changes  in  the  characteristic 
variables  in  the  coordinate  direction  along  which  the  equations  are  being  solved  [1]. 


III.  Results 

The  algorithm  described  above  has  been  applied  to  the  problem  of  transonic  how 
paist  a  swept  wing  mounted  on  a  vertical  wall,  or  symmetry  plane.  The  results 
presented  here  have  been  calculated  for  the  ONERA  wing  M-6  [7]  on  a  C-grid  con¬ 
taining  192  X  32  X  32  mesh  cells,  in  the  wraparound,  normal  and  spanwise  directions, 
respectively.  The  grid  has  been  generated  by  a  weak  shearing  of  a  square  root  trans¬ 
formation  about  a  point  just  inside  the  leading  edge  of  the  wing  surface  in  each  plane 
of  constant  z. 

Figures  1(a)  and  1(b)  illustrate  the  general  nature  of  the  grid  system.  Figure 
1(a)  presents  the  distribution  of  cells  on  the  wing  planform  (x  -  z  plane),  while 
Figure  1(b)  presents  a  perspective  view  of  the  wing  mounted  on  the  wail;  the  C-grid 
shown  in  the  symmetry  plane  is  typical  of  the  mesh  in  each  x  -  y  plane.  The  far 
field  boundaries  are  located  approximately  10  chords  upstreami  and  downstream  of 
the  wing,  approximately  19  chords  laterally  from  the  wing,  and  at  approximately 
3.5  semispans  from  the  plane  of  symmetry  in  the  z  direction. 

Results  have  been  calculated  for  a  free  stream  Mach  number  of  0.839  and  3.06 
degrees  angle  of  attack  in  order  to  allow  comparison  with  existing  wind-tunnel  test 
data  [7].  Figure  2  presents  the  streamwise  pressure  distribution  at  each  of  the 
computational  stations  on  the  upper  and  lower  surfaces  of  the  wing,  and  Figures  3(a) 
and  3(b)  present  contours  of  constant  p/poo  on  the  upper  and  lower  surfaces  of  the 
wing.  The  development  of  the  “Lambda”  shock  pattern  on  the  wing  upper  surface, 
characteristic  of  supercritical  flows  past  swept  wings,  is  clearly  visible.  Figures  4(a) 
and  4(b)  present  a  comparison  with  wind-tunnel  data  [7]  at  two  spanwise  stations. 
The  calculated  residts  predict  quite  accurately  the  strengths  and  the  locations  of 
the  shocks,  in  spite  of  the  neglect  of  viscous  effects  in  the  calculation. 

Figure  5  presents  contours  maps  of  entropy  in  several  y  —  z  planes,  starting 
at  the  leading  edge  and  moving  downstream.  Since  the  entropy  is  constant  along 
streamlines  for  a  steady,  inviscid  flow,  these  plots  can  be  viewed  as  representing 
cuts  through  stream  surfaces,  and  reveal  the  generation  and  evolution  of  the  wing- 
tip  vortex.  It  is  clear  that  the  vortex  center  moves  up  and  inboard  as  it  develops 
downstream,  as  is  observed  in  experiment. 


1044 


Diagonal  Implicit  Multigrid  Solution 


Figure  6  presents  convergence  histories  for  the  Diagonal  ADI  scheme  on  a  single 
grid,  and  when  5  levels  of  multigrid  are  used.  The  logarithm  of  the  average  resid¬ 
ual,  the  drag  coefficient,  and  the  number  of  cells  in  which  the  local  Mach  number 
is  supersonic  are  plotted  as  a  function  of  computational  work,  measured  in  Work 
Units.  One  Work  Unit  corresponds  to  the  computational  labor  required  for  a  single 
time  step  on  the  fine  grid.  For  both  calculations,  local  time  stepping  is  used  at  a 
Courant  Number  of  C  =  16.  The  figure  illustrates  two  points:  (l)  the  Diagonal 
ADI  scheme  itself  is  a  reasonably  efficient  time-stepping  algorithm;  and  (2)  an  ap¬ 
preciable  increase  in  the  convergence  rate  is  achieved  when  multigrid  is  used.  The 
aerodynamic  force  coefficients  have  converged  to  within  plottable  accuracy  in  about 
200  time  steps  for  the  single-grid  calculation,  and  in  the  equivalent  of  about  30  time 
steps  for  the  5-level  multigrid  calculation. 

The  calculations  were  performed  on  an  IBM  3090-600E,  and  required  about 
24  minutes  of  CPU  time  for  30  Work  Units,  which  is  equivalent  to  approximately 
2.4  X 10"'*  3  per  mesh  cell  per  Work  Unit.  This  is  comparable  to  the  time  required  for 
the  explicit  multi-stage  (Runge-Kutta)  scheme  of  Jameson  et  al  [2],  when  applied  to 
three-dimensional  problems  (see,  e.g.,  Jameson  &  Baker  [8]).  The  efficiency  gained 
by  the  diagonalization  procedure  is  compounded  by  the  fact  that  the  additional 
work  required  to  compute  the  elements  of  the  modal  matrices  is  vectorizable,  and 
thus  is  efficiently  performed  on  computers  with  vector  capabilities. 
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(a)  (b) 

Figure  3.  Plan  views  of  constant  pressure  contours  on  upper  and  lower  wing  surfaces. 


Figure  4.  Comparisons  of  measured  and  computed  pressure  coefRcients  at  selected 
span  locations  of  ONERA  Wing  M6.  Free  stream  Mach  number  is  A/qo  =  0.839  and 
angle  of  attack  is  a  =  3.06*’. 
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Figure  5.  Contours  of  constant  entropy  contours  in  selected  cross  planes. 
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Figure  6.  Convergence  histories  for  single-grid  and  5-level  multigrid  schemes. 
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ABSTRACT.  We  discuss  experiments  conducted  on  mesh  moving  and  local  mesh 
refinement  algorithms  that  are  used  with  a  fmite  difference  scheme  to  solve 
initial-boundary  value  problems  for  vector  systems  of  hyperbolic  partial  differential 
equations  in  one  dimension.  The  mesh  moving  algorithms  move  a  coarse  base  mesh  by 
a  mesh  movement  function  so  as  to  follow  and  isolate  spatially  distinct  phenomena. 

The  local  mesh  refinement  method  recursively  divides  the  time  step  and  spatial  cells  in 
regions  where  error  indicators  are  high  until  a  prescribed  error  tolerance  is  satisfied. 

The  adaptive  mesh  adgorithms  30*6  implemented  in  a  code  with  an  initial  mesh 
generator,  a  MacCormack  finite  difference  scheme,  and  am  error  estimator. 

Experiments  are  conducted  for  several  different  problems  to  determine  the  efficiency  of 
the  adaptive  methods  and  their  combinations  and  to  gauge  their  effectiveness  in  solving 
one-dimensional  problems. 


1.  INTRODUCTION.  Our  goal  is  to  develop  expert  systems  software  for  solving 
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time-dependent  partial  differential  equations.  The  software  should  allow  users  to 
describe  problems  in  a  natural  language,  have  a  convenient  geometric  description 
interface,  and  not  require  knowledge  of  sophisticated  numerical  analysts.  The  systems 
should  be  intelligent,  efficient,  reliable,  robust  and  able  to  solve  a  large  class  of 
problems  to  prescribed  error  tolerances. 

The  power  of  adaptive  techniques  is  that  they  are  capable  of  making  decisions 
that  change  the  computational  environment.  This  significantly  minimizes  the  number  of 
a  priori  decisions  demanded  of  the  user  and  provides  dramatic  savings  in  the  cost  of  the 
computation.  This  capability  is  performed  by  procedures  that  monitor  intermediate 
results  and  feed  back  this  data  to  a  control  mechanism  that  modifies  the  solution 
strategy.  Three  popular  adaptive  techniques  for  solving  partial  differential  equations 
are  mesh  moving  or  rezoning  (r-refinement),  mesh  refinement  (h-refmement),  and  order 
enrichment  (p-refinement).  In  r-refinement,  the  mesh  is  moved  either  continuously  or 
statically  at  discrete  times  in  order  to  resolve  nonuniformities  and  reduce  errors. 
H-refinement  involves  the  addition  or  deletion  of  computational  cells  to  the  mesh  and 
p-refinement  involves  increasing  or  decreasing  the  order  of  a  method  in  different 
portions  of  the  domain.  All  strategies  attempt  to  organize  the  computation  so  that  little 
effort  is  expended  in  regions  where  the  solution  is  smooth  and  a  much  greater  effort  is 
devoted  to  regions  where  the  solution  is  more  difficult  to  compute. 

The  different  refinement  strategies  are  being  combined  to  jnield  remarkable 
results.  Babuska  and  Szabo  [8]  showed  that  an  hp-refmement  scheme  produced  an 
exponential  rate  of  convergence  on  a  singular  elasticity  problem.  Amey  and 
Flaherty  [6]  developed  an  hr-refinement  scheme  that  moved  a  'base'  coarse  mesh  so  as 
to  follow  important  dynamic  structures  of  the  solution  2md  recursively  refined  the  base 
mesh  to  improve  resolution.  They  found  that  mesh  motion  was  inexpensive  relative  to 
mesh  refinement  and  reduced  dispersive  errors  aissociated  with  wave  motion  but  did  not 
always  accurately  follow  structures,  especially  when  interactions  occurred,  and  could  not 
dependably  satisfy  prescribed  tolerances.  Recursive  mesh  refinement  can  satisfy 
prescribed  tolerances  but  involves  more  complicated  data  structures  amd  greater  care  at 
coarse-fine  mesh  interfaces  than  r-refinement. 

There  are  numerous  other  variations  of  the  three  adaptive  strategies  for 
time-dependent  problems.  For  example,  temporal  refinement  can  be  done  globally  to 
produce  an  adaptive  method  of  lines  strategy  [1,13]  or  locally  in  combination  with  the 
spatial  refinement  strategy  [7,16]. 

Acctirate  a  posteriori  error  estimation  is  essential  for  codes  that  strive  to  satisfy 
user-prescribed  error  tolerances.  Error  estimation  is  often  the  most  expensive  part  of 
an  adaptive  algorithm.  Amey  and  Flaherty  [6]  calculated  the  local  discretization  error 
at  nodes  of  the  mesh  using  an  algorithm  based  on  Richardson  [22]  extrapolation.  This 
pointwise  estimate  can  then  be  used  to  construct  several  global  measures  of  the 
discretization  error.  The  advantage  of  this  method  is  that  it  can  be  used  to  find  error 


estimates  for  any  numerical  scheme  without  explicitly  knowing  the  exact  form  of  the 
error.  Details  of  this  error  estimate  and  its  implementation  on  a  moving  mesh  are 
discussed  in  Amey  [3]  and  Amey  et  al.  [4]. 

In  this  paper,  we  apply  Amey  and  Flaherty's  [6]  adaptive  mesh  moving  and 
refinement  technique  to  one-dimensional  hyperbolic  systems.  As  described  in  Section  2, 
their  approach  consists  of  moving  a  base  mesh  of  quadrilateral  cells  so  as  to  isolate 
important  spatial  structiures  of  the  solution.  Refinement,  when  needed,  is  performed 
within  cells  of  coarser  meshes.  Solutions  are  generated  by  a  MacCormack  [19]  finite 
difference  scheme  and  local  error  estimates,  that  are  used  to  control  mesh  motion  and 
refinement,  are  computed  by  Richardson  [22]  extrapolation.  Our  goal  is  to  quantify  the 
relative  costs  and  benefits  of  mesh  motion  and  local  mesh  refinement.  In  Section  3,  we 
report  the  results  of  computational  experiments  performed  on  three  one-dimensional 
problems  using  several  conventional  and  adaptive  numerical  procedures.  The  results 
obtained  demonstrate  both  the  potential  and  limitations  of  the  adaptive  algorithm.  We 
have  mixed  results  showing  that  the  effects  of  mesh  moving  can  be  problem-dependent. 
Generally,  mesh  motion  is  effective  for  following  an  isolated  structure,  but  much  less  so 
when  structures  interact.  In  Section  4,  we  discuss  the  utility  of  oiu*  methods,  the 
computational  results,  and  future  work. 


2.  ALGORITHM.  We  consider  an  application  of  Amey  and  Flaherty’s  [6]  adaptive 
procedure  to  one-dimensional  vector  systems  of  hyperbolic  conservation  laws  having  the 
form 

(1)  u^  +  ?^(x,  u,  t)  =  0,  X  e  D,  t  >  0 

(2)  u(x,  0)  =  u^j(x),  X  «  D  O  3D, 

with  appropriate  well-posed  conditions  on  the  boimdary  3  D  of  a  domain  D.  Like  them, 
we  discretize  Eqs.  1  and  2  using  a  MacCormack  [19]  finite  difference  scheme  because  of 
its  general  applicability  [20].  Although  this  scheme  suffers  a  reduction  in  order  on  a 
moving  nonuniform  grid,  our  computations  show  that  proper  mesh  moving  can  provide 
enough  efficiency  and  accuracy  to  compensate  for  this  order  reduction. 

The  MacCormack  scheme  produces  spurious  oscillations  near  discontinuities 
because  it  is  a  centered  scheme  with  second  order  accuracy  on  a  uniform  mesh.  The 
use  of  artificial  viscosity  to  make  this  scheme  total  variation  diminishing  (TVD)  makes 
it  attractive  as  a  general  solver  for  problems  with  discontinuities  and  we  use  a  model 
due  to  Davis  [12].  The  artificial  viscosity  terms  are  calculated  from  the  solution  data  at 
the  beginning  of  each  time  step  and  are  added  to  the  solution  ^fter  the  MacCormack 
solution  has  been  calculated. 

Amey  and  Flaherty's  [5]  mesh  moving  procedure  is  based  on  an  intuitive 
approach  that  allows  nodes  to  follow  local  nonuniformities  rather  than  the  more  analytic 
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approaches  of  equidistribution  of  error  [13]  or  the  solving  of  variational  problems  to 
minimize  some  given  functional  [14],  which  can  be  expensive  and  problem-dependent. 
They  derive  equations  for  the  nodal  velocities  so  that  the  mesh  moves  to  follow  the 
geometric  propagation  of  some  local  nonuniformity.  This  generally  reduces  dispersive 
errors  and  allows  the  use  of  larger  time  steps  while  maintaining  accuracy  and  stability. 
Important  factors  for  mesh  moving  are  to  maintain  mesh  smoothness  by  controlling 
adjacent  cell  ratios,  to  keep  nodes  within  the  domain  boundaries,  and  to  move  nodes 
with  a  velocity  that  reduces  discretization  error.  In  order  to  prevent  mesh  distortion 
that  can  lead  to  increased  discretization  error  of  the  solver,  mesh  points  cannot  move 
independently  but  must  be  coupled  to  at  least  some  of  their  neighbors. 

Some  schemes  do  this  coupling  by  attraction  and  repulsion  of  nodes  (cf.,  Rai  and 
Anderson  [21]).  In  these  algorithms,  the  coupling  is  done  globally,  where  each  node 
influences  the  velocity  of  all  other  nodes  in  the  mesh.  Attempting  to  equidistribute 
errors  can  lead  to  problems  where  nodes  move  incorrectly  in  some  regions.  This  occurs, 
for  example,  when  a  mesh  that  is  following  one  structure  must  react  to  another 
nonuniformity  that  arises  in  another  part  of  the  domain.  An  abrupt  grid  adjustment 
can  be  eliminated  if  the  influence  is  more  local  and  the  movement  algorithm  is  combined 
with  a  mesh  refinement  scheme  to  add  the  necessary  nodes  in  the  region  of  the  new 
structure. 

At  each  time  step,  the  selection  mechanism  of  Amey  and  Flaherty's  [6]  mesh 
moving  algorithm  uses  as  feedback  the  current  node  locations  and  the  nodal  values  of  a 
mesh  movement  indicator  at  the  independent  moving  nodes  of  a  coarse  mesh.  The  local 
error  estimates  are  used  as  the  mesh  movement  indicators.  Nodes  with  'significant 
error'  are  grouped  into  error  clusters.  This  clustering  separates  the  important  spatially 
distinct  phenomena  of  the  solution.  As  time  evolves,  the  clusters  can  move,  change 
size,  collide,  or  separate.  At  each  time  step,  new  clusters  can  be  created  and  old  ones 
can  vanish. 

Mesh  movement  is  then  determined  by  each  node's  relationship  to  its  nearest 
error  cluster  and  the  propagation  velocity  of  the  center  of  error  mass  of  the  cluster. 
Therefore,  the  nodal  influence  is  regional.  The  amoimt  of  movement  is  determined  by  a 
movement  function  which  insures  that  the  center  of  error  of  the  cluster  moves  according 
to  a  differentiai  equation  suggested  by  Coyle  et  al.  [11] 

(3)  r  +  X  f  =  0, 

where  r(t)  is  the  position  of  the  center  of  error  mass  of  a  cluster  and  (  ' )  :  =  d(  )/dt. 
Additionally,  this  movement  function  smoothes  the  mesh  motion  and  prevents  nodes 
from  moving  outside  the  domain  boundary.  The  distance  a  node  moves  is  reduced  near 
boundaries  in  order  to  prevent  it  from  leaving  the  domain.  Nodes  on  the  domain 
boundaries  are  not  allowed  to  move. 
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Arney  and  Flaherty  [6]  perform  static  rezoning  whenever  computation  with  the 
current  mesh  would  be  counterproductive  or  when  the  current  mesh  suffers  from  poor 
mesh  ratios.  There  are  sophisticated  algorithms  to  check  mesh  condition  and  to  verify 
the  validity  of  the  mesh  (cf.,  Babuska  and  Szabo  [8]  and  Simpson  [23]).  These 
algorithms  can  check  for  gaps  between  cells  and  overlapping  cells.  Since  moving 
meshes  can  only  develop  such  severe  problems  over  time,  mesh  degradation  can  be 
discovered  before  it  develops  complete  invalidity.  This  mesh  degradation  or 
ill-conditioning  occxirs  when  the  mesh  angles  are  severe,  the  mesh  contains  poor  mesh 
ratios  or  poor  aspect  ratios,  or  the  time  step  is  too  restrictive  because  of  the  crowding  of 
nodes.  Nodes  of  the  coarse  mesh  can  become  too  crowded  when  error  clusters  pass 
through  boundaries  or  when  two  or  more  error  clusters  converge  and  trap  nodes 
l  etween  them.  Static  rezoning  is  performed  only  when  absolutely  necessary  due  to  the 
nigh  cost  in  accurately  interpolating  the  solution  from  the  existing  mesh  onto  a  new  one. 
It  was  not  done  in  any  of  the  examples  presented  in  this  paper. 

Amey  and  Flaherty  [6]  used  local  mesh  refinement  to  insure  that  the 
user-prescribed  error  tolerance  were  satisfied.  This  was  done  by  recursively  introducing 
finer  meshes  by  binary  refinement  of  space-time  cells  in  regions  where  nodes  with 
unacceptable  error  have  formed  clusters  (cf.,  Berger  [9],  Flaherty  and  Moore  [16]  and 
Gropp  [17]).  The  clustering  algorithm  used  for  refinement  is  the  same  as  the  one  used 
for  mesh  movement.  The  clusters  are  buffered  so  that  high  error  nodes  are  in  the 
interior  of  the  refined  region.  The  problem  is  recursively  solved  on  these  fine  meshes 
until  the  error  is  within  the  specified  tolerance.  The  refined  subgrids  that  are 
adaptively  created  by  the  local  refinement  algorithm,  overlay  the  coarser  grids.  Each  of 
these  subgrids  is  independently  defined.  Figure  1  shows  a  coarse  grid  with  portions 
overlayed  by  two  fine  grids  and  three  finer  grids. 

Amey  and  Flaherty's  [7]  mesh  refinement  strategy  suggests  the  use  of  a  tree 
data  structure  for  its  description  and  implementation.  In  this  tree  structure,  the 
coarsest  grid  is  the  root  node  and  defined  as  level  0  in  the  tree.  The  subgrids  of  the 
coarse  grid  are  its  offspring  in  the  tree  and  are  defined  as  level  1.  A  grid  at  level  Jt  is 
properly  nested  in  the  tree  between  its  parent  at  level  /  -  1  and  its  offspring  (if  any)  at 
level  /  +  1.  Grids  at  the  same  level  are  given  an  arbitrary  ordering.  Due  to  the 
clustering  and  buffering  of  error  regions,  grids  at  the  same  level  of  a  two-dimensional 
problem  can  intersect  and  overlap.  Figure  1  depicts  an  example  of  a  sequence  of 
meshes  that  might  be  produced  by  our  refinement  procedure  for  a  coarse  grid 
refinement  step.  The  niunbers  next  to  the  grids  indicate  the  order  in  which  the  solution 
is  computed  on  each  grid. 

Such  tree  data  structures  are  commonly  used  in  adaptive  mesh  refinement 
procedures  (cf.,  Berger  and  Oliger  [10]  and  Flaherty  and  Moore  [16]).  Additionally,  we 
use  a  stack  to  implement  the  recursive  algorithm  (cf.,  Aho  et  al.  [2]  and  Horowitz  and 
Sahni  [18]). 
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Figure  1.  Typical  set  of  local  refinement  grids  for  one  coarse  time  step.  The 
numbers  indicate  the  order  in  which  the  solution  is  computed  on 
each  grid. 


The  solution  vectors,  error  estimates,  and  nodal  information  are  all  stored  in  a 
dynaxnic  storage  area  with  pointers  from  the  tree  to  this  storage  area  for  each  mesh  in 
the  tree.  For  each  grid,  we  store  its  level  in  the  tree  and  the  number  of  nodes  it 
contains.  The  dynamic  storage  area  contains  the  solution  vector  and  the  error  estimate 
at  each  node,  and  nodal  information  for  use  in  the  solver  and  grid  interface  procedures. 
Since  old  mesh  data  is  saved  to  obtain  initial  data  for  the  newly  refined  grids  and  nodes 
of  the  parent  grids  are  updated  from  the  fine  grid  solutions,  nodal  relationships  between 
meshes  are  stored  directly  in  the  nodal  vector. 


3.  COMPUTATIONAL  RESULTS.  We  conducted  experiments  of  Amey  and 
Flaherty’s  [6]  adaptive  mesh  strategy  using  three  problems  and  our  results  follow.  In 
each  case,  errors  are  measured  in  the  L^  norm  and  the  CPU  times  are  normalized  to 
unity.  All  calculations  were  performed  on  an  IBM  308  ID  computer. 

Example  1.  Consider  the  scalar  hyperbolic  differential  equation 

(4)  u^  +  (cos  TTt)  u^  =  0,  t  >  0,  —0.4  s  X  S  1.4, 

with  initial  conditions 

1,  if  0.4  s  X  S  0.6 

(5)  u(x,  0)  =  < 

0,  otherwise, 

and  boundary  conditions 

(6)  u(— 0.4,  t)  =  u(1.4,  t)  =  0. 

The  exact  solution  to  this  problem  is 

1,  if  0.4  jS  X  -  (sin  irt)  /  ir  S  0.6 

(7)  u(x,  t)  =  < 

0,  otherwise, 

which  is  a  square  pulse  of  unit  amplitude  that  oscillates  sinusoidsdly  about  the  center  of 
the  domain.  Artificial  viscosity  was  added  to  eliminate  oscillations  in  the  solution; 
however,  this  resulted  in  an  attenuation  and  spreading  of  the  square  pulse. 

Foiu:  different  adaptive  strategies  were  used  to  solve  this  problem  for 
0  s  t  s  2.5.  The  solutions  at  several  times,  the  mesh  trajectories,  and  the  time  step 
profile  for  the  various  strategies  are  shown  in  Figures  2,  3,  4,  and  5.  Table  1 
summarizes  the  computational  cost  euid  accuracy  of  the  four  strategies. 

With  a  stationary  uziiform  mesh,  we  find  that  the  square  pulse  is  rapidly 
attenuated  and  diffused.  The  time  step  profile  shows  how  the  Courant  number  is 
utilized  to  maintain  maximum  step  sizes  without  loss  of  stability.  From  Figure  3,  it  is 
apparent  that  the  results  improve  when  the  mesh  is  allowed  to  move.  The  pulse  is 
attenuated  less,  the  error  is  reduced,  but  more  time  steps  are  needed  to  complete  the 
computation.  The  mesh  trsyectories  in  Figiure  3  demonstrate  how  well  the  nodes  track 
the  square  pulse  as  it  oscillates.  Figures  4  and  5,  depicting  the  results  of  Strategies  3 
and  4,  respectively,  show  remarkable  improvement  when  adaptive  mesh  refinement  is 
used.  In  both  cases,  the  local  error  tolerance  was  specified  as  0.001.  Errors  are 
reduced  and  the  attenuation  of  the  pulse  is  almost  negligible  but  shape  distortion  is  still 


Adaptive  Strategy 

II  e  11^ 

Number  of 
Space-time  Cells 

Normalized 
CPU  Time 

Attenuation 

Stationary 

uniform  mesh 

0.1090 

774 

1.000 

0.545 

Moving  mesh 

0.0903 

1134 

1.452 

0.730 

Stationary 
uniform  mesh 
with  refinement 

0.0614 

15718 

8.761 

0.969 

Moving  mesh  with 

refinement 

0.0395 

16554 

10.069 

0.994 

Table  1.  Comparison  of  the  different  adaptive  strategies  for  Example  1. 


significant.  Notice  how  well  the  refinement  procedure  tracks  the  pulse;  however,  the 
cost  of  computation  increases  by  almost  an  order  of  magnitude.  When  moving  and 
refinement  are  combined,  the  results  are  even  more  remarkable.  The  pulse  is 
attenuated  by  a  factor  of  only  0.6  percent. 
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O.QO  0.63  l.'.^5  1.88  2.50  0.00  0.25  0.50  0.7S 


Figure  4.  Solution  at  t  =  0,  0.57,  1.15,  1.80,  and  2.33,  mesh  trajectories, 
and  time  step  profile  for  Strategy  3  of  Example  1. 
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O.QQ  0.63  1.25  1.98  2.50 


Figure  5.  Solutions  at  t  =  0,  0.77,  1.37,  1.91,  and  2.50,  mesh  trsyectories, 
and  time  step  profile  for  Strategy  4  of  Example  1. 
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Example  2.  Consider  the  linear  uncoupled  system 
u^  +  (cos  ’ft)  =  0, 

(8)  t  >  0,  -0.4  S  X  S  1.4, 

—  (cos  irt)  -  0, 

with  initial  conditions 

[  1,  if  0.4  s  X  S  0.6 

(9)  u(x,  0)  =  v(x,  0)  =  < 

[  0,  otherwise, 

£uid  boundary  conditions 

(10)  u(  — 0.4,  t)  =  u(1.4,  t)  =  v(— 0.4,  t)  =  v(1.4,  t)  =  0. 

The  exact  solution  to  this  problem  is 

{1,  if  0.4  s  X  —  (sin  irt)  /  ir  S  0.6 

0,  otherwise, 

1,  if  0.4  s  X  +  (sin  irt)  /  ir  S  0.6 

(12)  v(x,  t)  =  - 

0,  otherwise. 

V  ’ 

The  first  component  u  is  the  same  as  Example  1,  and  the  second  component  v 
moves  symmetrically  with  u.  Four  different  adaptive  strategies  were  used  to  solve  this 
problem  for  0  ^  t  S  1.5.  Table  2  summarizes  the  computational  cost  and  accuracy  of 
the  four  strategies.  The  solutions  at  several  times,  the  mesh  trajectories,  and  the  time 
step  profile  for  mesh  strategies  3  and  4  are  shown  in  Figures  6  and  7,  respectively. 

It  is  clear  that  mesh  moving  does  not  provide  the  expected  improvement  in  the 
results  for  this  problem.  In  fact,  we  can  see  from  Table  2  that  each  time  the  mesh  is 
moved,  the  error  in  the  computed  solution  increases.  This  is  because  two  identicsd  error 
regions  moving  symmetrically  about  the  center  of  the  domain  do  not  contribute  equally 
to  the  mesh  motion  due  to  asymmetries  in  their  error  estimates.  As  a  result,  the  mesh 
moves  incorrectly  and  the  solution  deteriorates.  This,  in  turn,  leads  to  further 
imbalance  of  the  error  clusters  and  subsequently  causes  catastrophic  effects. 

Comparing  Figures  6  and  7,  we  see  how  bad  the  solution  is  attenuated  due  to  incorrect 
mesh  motion.  Improper  mesh  motion  has  also  lead  to  reHnement  in  some  regions  of  the 
mesh  where  it  should  not  have  been  necessary.  In  both  cases,  the  local  error  tolerance 
was  specified  to  be  0.005. 


Nxunber  of 


Normalized 
CPU  Time 


Adaptive  Strategy 

II  e  11^ 

Space-time  Cells 

1. 

Stationary  uniform  mesh 

0.1145 

1650 

2. 

Moving  mesh 

0.1221 

5640 

3. 

Stationary  uniform  mesh  with 

refinement 

0.0541 

20828 

4. 

Moving  mesh  with  refmement 

0.0583 

48954 

Table  2.  Comparison  of  the  different  adaptive  strategies  for  Example  2. 


1.000 

3.386 

6.926 

18.667 
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Figure  6.  Solutions  for  u  at  t  =  0,  0.17,  0.48,  0.96,  and  1.38,  mesh 

trajectories,  and  time  step  profile  for  Strategy  3  of  Example  2. 
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X  OT 


Figure  7.  Solutions  for  u  at  t  =  0,  0.76,  0.95,  1.17,  and  1.49,  mesh 

trajectories,  and  time  step  profile  for  Strategy  4  of  Example  2. 
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Example  3.  Consider  the  coupled  h}rperbolic  system  from  the  wave  equation 


(13) 


u,  -  =  0, 


V  -  u  =0, 

t  X 


with  initial  conditions 


(14) 


u(x,  0)  =  < 


1, 

0, 


(15) 


v(x,  0)  =  { 


1. 


t  >  0,  -0.3  S  X  S  1.4, 

if  0.4  £  X  £  0.6 

otherwise, 

if  0.5  X  ^  0.7 


(  0,  otherwise, 

and  boundary  conditions  satisfying  the  exact  solution 


(16) 

u(x,  t)  =  (p(x  +  t)  +  q(x  —  t))  /  2.0 

(17) 

v(x,  t)  5=  (p(x  +  t) 

-  q(x  -  t))  /  2.0, 

where 

f 

2, 

if  0.5  S  (  S  0.6 

(18) 

P({)  =  < 

if  0.4  S  £  <  0.5 

[o. 

otherwise. 

and 

’  1, 

if  0.4  S  (  £  0.5 

(19) 

q(«)  =  - 

if  0.6  S  {  S  0.7 

[o. 

otherwise. 

Four  different  adaptive  strategies  were  used  to  solve  this  problem  for 
0  S  t  s  0.6.  Table  3  sxunmarizes  the  computational  cost  and  accuracy  of  the  four 
strategies.  The  solutions  at  several  times,  the  mesh  trajectories,  and  the  time  step 
profile  for  mesh  strategies  3  and  4  are  shown  in  Figures  8  and  9,  respectively. 

Once  again,  mesh  motion  does  not  appear  to  result  in  the  desired  improvement 
in  the  solution.  In  this  case,  there  are  two  error  regions  moving  away  with  unit  speed 
in  opposite  directions  from  the  center  of  the  domain.  However,  the  error  regions  are  not 
identical  as  was  the  case  in  Example  2.  With  a  moving  mesh,  the  solution  is 
attenuated  and  consequently,  the  error  measure  in  the  norm  increases. 
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Number  of 


Normalized 
CPU  Time 


Adaptive  Strategy 

II  e  11^ 

Space-time  Cells 

1. 

Stationary  uniform  mesh 

0.1141 

930 

2. 

Moving  mesh 

0.1057 

3180 

3. 

Stationary  uniform  mesh  with 

refinement 

0.0527 

23236 

4. 

Moving  mesh  with  refinement 

0.0552 

39234 

Table  3.  Comparison  of  the  different  adaptive  strategies  for  Example  3. 


1.000 

3.454 

11.493 

20.232 


1068 


-0.30  0.!2  0.S5  0.97  1.40  3.00  0.01  0.03 

X  DT 


Figure  8.  Solutions  for  u  at  t  =  0,  0.20,  and  0.58,  mesh  trajectories,  and 
time  step  profile  for  Strategy  3  in  Example  3  with  error 
tolerance  =  0.005. 
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Figure  9.  Solutions  for  u  at  t  =  0,  0.41,  and  0.60,  mesh  trajectories,  and 
time  step  profile  for  Strategy  4  in  Example  3  with  error 
tolerance  =  0.005. 
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4,  CONCLUSION.  We  have  experimented  with  the  adaptive  mesh  method  of  Amey 
and  Flaherty  [6],  Oixr  results  indicate  that  proper  mesh  moving  can  efficiently  reduce 
errors.  However,  their  mesh  moving  is  not  effective  for  problems  that  have  more  than 
one  moving  structure.  We  find  that  whenever  there  are  two  error  regions,  the  mesh 
moving  strategy  is  unable  to  make  an  accurate  decision.  This  occurs  particularly  during 
the  time  when  the  two  structures  have  not  completely  separated  but  still  form  one  large 
error  cluster.  Results  of  tests  of  local  refinement  show  that  it  can  efficiently  reduce 
errors.  The  most  powerful  method  was  the  combination  of  both  mesh  moving  and  mesh 
refinement.  Results  obtained  for  Example  1  show  that  a  totally  adaptive  mesh  strategy 
can  be  extremely  effective.  The  overhead  associated  with  the  clustering  and  dynamic 
data  structures  is  only  about  5  percent  of  the  time  needed  to  calculate  a  comparable 
solution  on  a  uniform  mesh. 

Additional  computation  is  needed  to  verify  the  generality  of  these  conclusions.  It 
is  also  not  clear  how  much  of  the  difficulties  were  due  to  MacCormack's  [19]  finite 
difference  scheme  or  the  Richardson's  [22]  extrapolation-based  error  estimate.  A  TVD 
scheme  would  greatly  improve  performance  near  discontinuities. 

We  are  re-examining  the  entire  process  in  order  to  determine  an  effective  mesh 
motion  procedure.  Future  computations  will  be  performed  using  more  advanced  shock 
capturing  difference  schemes  (e.g.,  Engquist  and  Osher  [15]). 
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ABSTRACT 

The  determination  of  aerodynamic  coefficients  by  shell  designers  is  a 
critical  step  in  the  development  of  any  new  projectile  design.  Of  particular 
interest  is  the  determination  of  the  aerodynamic  coefficients  at  transonic 

speeds.  It  is  in  this  speed  regime  that  the  critical  aerodynamic  behavior 
occurs  and  a  rapid  increase  in  the  aerodynamic  coefficients  is  observed.  The 
three-dimensional  transonic  flowfield  computations  over  projectiles  have  been 
made  using  an  implicit,  approximately  factored,  partially  flux-split 
algorithm.  Use  of  a  composite  grid  scheme  has  been  made  to  provide  the 
increased  grid  resolution  needed  for  accurate  numerical  simulation  of  three- 
dimensional  transonic  flows.  Details  of  the  asymmetrically  located  shockwaves 
on  the  projectiles  have  been  determined.  Computed  surface  pressures  have  been 
compared  with  experimental  data  and  are  found  to  be  in  good  agreement.  The 
pitching  moment  coefficient,  determined  from  the  computed  flowfields,  shows 
the  critical  aerodynamic  behavior  observed  in  free  flights. 


I.  INTRODUCTION 

The  flight  of  projectiles  covers  a  wide  range  of  speeds.  The  accurate 
prediction  of  projectile  aerodynamic  at  these  speeds  is  of  significant  impor¬ 
tance  in  the  early  design  stage  of  a  projectile.  The  critical  aerodynamic 
behavior  occurs  in  the  transonic  speed  regime,  0.9  <  M  <  1.1  where  the  aerody¬ 
namic  coefficients  have  been  found  to  increase  by  as  much  as  100?J.  Of  parti¬ 
cular  interest  is  the  determination  of  the  pitching  moment  coefficient  since 
f  it  determines  the  static  stability  of  the  flight  of  the  projectile.  The 

critical  behavior  in  this  case  is  usually  characterized  by  a  rapid  increase  in 
the  coefficient  followed  by  a  sharp  drop.  This  rapid  change  in  the  pitching 
moment  coefficient  can  be  attributed  in  part  to  the  complex  flow  structure  and 
in  particular,  to  the  asymmetrically  located  shock  waves  that  exist  on  the 
projectiles  flying  at  transonic  speeds  at  angle  of  attack.  Computations  of 
three-dimensional  flowfields  at  transonic  speeds  are  thus  needed  to  predict 
the  critical  aerodynamic  behavior. 

In  recent  years  a  considerable  research  effort  has  been  focused  on  the 
development  of  modern  predictive  capabilities  for  determining  projectile 
aerodynamics.  Numerical  capabilities  have  been  developed  primarily  using 
Navier-Stokes computational  technique  and  used  to  compute  flow  over  slender 
bodies  of  revolution  at  transonic  speeds.  Flowfield  computations  included 
both  axi  symmetric^  and  three-dimensional  situations.  ^ References  1  and  2 
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did  not  include  the  computations  in  the  wake  or  base  region  of  a  projectile 
and  thus,  ignored  the  upstream  effect  of  the  base  region  flow  on  the  afterbody 
flowfield  and  the  asymmetrical  locations  of  the  shock  wave.  An  axi symmetric 
base  flow  code ^  was  developed  to  compute  the  entire  projectile  flowfield 
including  the  base  region  using  a  flowfield  segmentation  procedure.  This 
technique  was  later  extended**  into  three  dimensions  to  calculate  the  pitch 
plane  aerodynamics  at  transonic  speeds.  Due  to  lack  of  computer  resources, 
only  one  solution  was  obtained  and  reported  in  Reference  4.  In  addition,  the 
calculations  in  References  1,  2,  and  4  generally  did  not  have  sufficient  grid 
resolution  due  to  lack  of  adequate  computer  resources.  Due  to  the  avail¬ 
ability  of  supercomputers  such  as  Cray  X-MP/48  and  Cray  2,  it  is  now  possible 
to  provide  the  increased  grid  resolution  needed  for  accurate  computations  of 
three-dimensional  transonic  flows. ^ 

The  numerical  scheme  plays  an  equally  important  role  for  accurate  predic¬ 
tions  of  transonic  flows.  All  the  calculations  in  Reference  1-4  were  made 
using  the  compressible,  thin-layer  Navier-Stokes  equations  which  were  solved 
using  the  implicit  Beam  and  Warming  central  finite  difference  scheme.®"®  Such 
schemes  require  artificial  dissipation  to  be  added  to  control  numerical  oscil¬ 
lations.  Upwind  schemes  can  have  several  advantages  over  central  difference 
schemes  including  natural  numerical  dissipation  and  better  stability  proper¬ 
ties.  The  numerical  scheme  used  here  is  an  implicit  scheme  based  on  flux- 
splitting^  and  upwind  spatial  differencing  in  the  streamwise  direction. 

Other  factors  that  have  direct  impact  on  the  3-D  numerical  simulation  are 
the  geometric  complexity  and  efficient  management  of  large  3-D  data  sets. 
These  factors  make  it  necessary  to  develop  zonal  or  patched  methods  where  a 
large  3-D  problem  is  divided  into  a  number  of  smaller  problems.  Each  smaller 
piece  is  then  solved  separately.  The  break-up  of  the  large  data  base  can  be 
achieved  in  various  ways.^®"^®  Reference  10  and  11  are  earlier  applications 
where  the  data  base  structure  follows  a  pencil  format.  These  numerical 
calculations,  although  promising,  were  based  on  limited  computer  resources. 
Reference  12  shows  the  development  of  a  chimera  grid  scheme.  This  scheme 
provides  multiple  regions  where  communications  between  grids  are  done  by 
interpolating  in  regions  of  overlap.  The  blocked  grid  approach^®  does  not 
require  interpolations  at  the  interfaces.  The  schemes  in  References  12  and  13 
are  generally  complicated  since  they  allow  to  embed  a  block  or  zone  into 
another.  Recently,  a  simple  composite  grid  scheme^**  has  been  developed  where 
a  large  single  grid  was  partitioned  into  smaller  grids  so  that  each  of  the 
smaller  problem  could  be  solved  separately  with  simple  data  transfers  at  the 
interfaces.  The  initial  results  obtained  were  very  promising.  The  present 
effort  extends  the  use  of  this  composite  grid  scheme  to  include  the  correct 
modeling  of  the  base  region  of  a  projectile.  Three-dimensional  flowfields 
have  been  computed  for  two  different  projectiles  at  various  transonic  speeds 
0.8  <  M  <  1.2  in  order  to  determine  the  critical  aerodynamic  behavior. 


II.  NUMERICAL  METHOD 


1.  GOVERNING  EQUATIONS 

The  three-dimensional  Navier-Stokes  conservation  equations  of  mass, 
momentum,  and  energy  can  be  represented  in  flux  vector  form  as: 


1^76 


(1) 


+  F^)  +  3^(G  +  Gy)  +  3^{H  +  H^)  =  0 


where  the  independent  variable  t  is  the  time  and  the  spatial  variables  K,  n,  c 
are  chosen  to  map  a  curvilinear  body  conforming  discretization  into  a  uniform 

computational  space.  Here  Q  contains  all  the  dependent  variables  and  F,  G 

A  AAA 

and  H  are  the  inviscid  fluxes.  The  flux  terms  F^,  and  contain  viscous 

derivatives  and  throughout  a  nondimensional  form  of  the  equations  is  used. 
The  conservative  form  of  the  equations  is  maintained  mainly  to  capture  the 
Rankine  Hugoniot  shock  jump  relations  as  accurately  as  possible. 

For  body  conforming  coordinates  and  high  Reynolds  number  flow  where  c  is 
the  coordinate  away  from  the  surface,  the  thin  layer  approximation  can  be  made 
in  the  c  direction  and  the  governing  equations  can  be  written  as: 

3^0  +  3^F  +  3^6  +  3^H  =  Re‘^3^S  .  (2) 


Here  the  viscous  terms  in  ?  have  been  collected  into  the  vector  S  and  the 
nondimensional  reciprocal  Reynolds  number  is  extracted  to  indicate  a  viscous 
flux  term. 

In  differencing  these  equations  it  is  often  advantageous  to  difference 
about  a  known  base  solution  denoted  by  subscript  o  as: 


A  A 


A  A 


^(q  -  Qo)  -  «r(F  -  F^)  .  6^(G  -  G^)  .  6  (H  -  H^) 


Re‘^6^(S  -  Sq)  =  -3^Qq  -  3^Fq  -  3^Gq  -  +  Re’^3^SQ 


(3) 


where  6  indicates  a  general  difference  operator,  and  3  is  the  differential 
operator.  If  the  base  state  can  be  properly  chosen,  the  differenced  quantita- 
tives  can  have  smaller  and  smoother  variation  and  therefore  less  differencing 
error.  The  freestream  is  used  as  a  base  solution  in  the  present  formulation. 

2.  II^LICIT  FINITE  DIFFERENCE  ALGORITHM 

The  implicit  approximately  factored  scheme  for  the  thin  layer  Navier- 
Stokes  equations  that  uses  central  differencing  in  the  n  and  c  directions  and 
upwinding  in  C  is  written  in  the  form: 
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+  6  (G 
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[I  +  h6^{A‘^)'’  +  h6  c"  -  hRe'V  -  DJ  ] 

^  >  in 

*  •’V"  -  '  -itti^KF*)"  -  f*]  +  iJUF-)"  - 

-  fij  +  ^^(h"  -  HJ  -  Re'‘T^(s"  -  SJ)  -  D^(q"  -  QJ 


p;] 


(A) 


where  h  =  At  and  the  freestream  base  solution  is  used.  Here  6  is  typically  a 
three  point  second  order  accurate  central  difference  operator,  1  is  the  mid¬ 
point  operator  used  with  the  viscous  terms,  and  the  operators  <5^  and  are 
backward  and  forward  three-point  difference  operators.  The  flux  F  has  been 
split  into  F  and  F  ,  according  to  its  positive  and  negative  eigenvalues  and 

A  A  A  A 

the  matrices  A,  B,  C  and  M  result  from  local  linearization  of  the  fluxes 
about  the  previous  time  level.  Here  J  denotes  the  Jacobian  of  the  coordinate 
transformation.  Dissipation  operators,  Dg  and  are  used  in  the  central 

space  differencing  directions.^  The  factored  left  hand  side  operators  can  be 
readily  inverted  by  sweeping  and  inversion  of  tridiagonal  matrices  with  5x5 
blocks.  This  two  factor  implicit  scheme  is  readily  vectorized  or  multi-tasked 
in  planes  of  5  =  constant. 

3.  COMPOSITE  GRID  SCHEME 

In  the  present  work,  a  composite  grid  scheme^**  has  been  used  where  a  large 
single  grid  is  split  into  a  number  of  smaller  grids  so  that  computations  can 
be  performed  on  each  of  these  grids  separately.  Each  of  tnese  grids  use  the 
available  core  memory  in  turn,  while  the  rest  are  stored  on  an  external  disk 
storage  device  such  as  the  SSD  of  the  Cray  X-MP/48  computer.  On  Cray  2  super¬ 
computer,  large  in  core  memory  is  available  to  fit  the  large  single  grid. 
However,  for  accurate  geometric  modeling  of  complex  projectile  configurations 
that  include  blunt  nose,  sharp  base  corner  and  base  cavities  etc.,  it  is  also 
desirable  to  split  the  ’arge  data  base  into  few  smaller  zones  on  Cray  2  as 
well. 

A  code  developed  for  a  single  grid  can  be  made  to  work  for  a  block  grid 
structure  by:  1)  mapping  and  storing  the  information  for  each  grid  onto  a 
large  memory;  and  2)  supplying  interface  boundary  arrays,  pointers  and 
updating  procedures.  Consider  the  situation  in  Figure  1  in  which  the  single 
grid  from  J  =  1,  is  partitioned  into  four  grids,  G1  through  G4.  The  base 

region  of  the  projectile  is  included  by  adding  another  zone  G5.  This  pro¬ 
cedure  preserves  the  actual  base  corner  and  no  approximation  is  made.  This 
zonal  scheme  has  been  modified  to  allow  more  than  one  zone  in  the  wake  for 
accurate  modeling  of  other  complicated  base  configurations  including  cavities, 
etc. 


The  use  of  a  composite  or  blocked  grid  scheme  requires  special  care  in 
storing  and  fetching  the  interface  boundary  data,  i.e. ,  the  communication 
between  the  various  zones.  For  the  simple  partitioning  shown  in  Figure  1,  all 
subgrid  points  are  members  of  the  original  grid.  There  is  no  mismatch  of  the 
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grid  points  at  the  interface  boundaries  and  no  interpolations  are  required. 
This  procedure  thus,  has  the  advantage  over-patched  or  overset  grid  schemes 
which  do  need  interpolations.  The  partitioned  grid  has  six  interface  bound¬ 
aries,  Ji  *  Ji  ,  Jp  “  1,  Jp  =  Jp  ,  Jo  =  1,  Jo  -  Jp  and  J^  =  1  in  the 
^  ^max.  ^  ^max  -^max 

streamwise  direction  and  two  interface  boundaries  in  the  normal  direction 
between  grids  G4  and  65.  Data  for  these  planes  are  to  be  supplied  from  the 
other  grids  by  injecting  interior  values  of  the  other  grid  onto  the  interface 
boundaries.  The  details  of  the  data  storage,  transfer  and  other  pertinent 
informations  such  as  metric  and  differencing  accuracy  can  be  found  in 
Reference  14. 

The  differencing  accuracy  near  the  interfaces  is  quite  important.  Three 
point  backward  and  forward  difference  operators  are  used  at  the  interior 
points.  Near  the  interface,  for  example,  at  Jp  =  Jp  -  1  three  point  for- 

^  max 

ward  difference  operator  cannot  be  used  with  one  grid  point  overlap  as  shown 
in  Figure  1.  The  differencing  accuracy  can  be  dropped  from  second  order  to 
first  order;  however,  this  leads  to  inaccuracies  in  the  flowfield  solution 
near  the  interfaces.^**  To  maintain  second  order  accuracy  near  the  interfaces, 

3F 

we  difference,  for  example,  —  at  Jp  =  Jp  -  1  as, 

35  ^  ‘^max 


3^(F"-)  ♦  sUf-) 


where  3^  is  the  usual  three  point  backward  difference  operator  and  5^  is  now  a 
central  difference  operator,  i.e. , 


if. 

35 


-  ^^j-l  ^  ^j-2 


■j+l 


-  ^-1 


2^5 


Near  the  other  interface  of  grid  2  (Jg  =  2),  the  3^  operator  is  corresponding¬ 
ly  replaced  by  a  central  difference  operator  while  3^  is  a  usual  three  point 

forward  difference  operator.  The  planes  Jp  =  1  and  Jp  =  Jp  are,  of  course, 

‘■max 

boundaries  for  grid  2  and  get  their  data  from  interior  flowfield  solutions 
from  neighboring  grids.  Second  order  accuracy  at  and  near  the  interfaces  is 
thus  maintained.  Partial  use  of  central  differencing  near  the  interfaces  has 
not  adversely  affected  the  stability  of  the  scheme. 


III.  MODEL  AND  C0W>UTATI0NAL  GRIDS 

The  first  model  used  for  the  experiment  and  computational  study  presented 
here  is  an  idealization  of  a  realistic  artillery  projectile  geometry.  The 
experimental  model  shown  in  Figure  2  is  a  secant-ogi ve-cyl  i nder-boattai 1 
(SOCBT)  projectile.  It  consists  of  a  three-caliber  (one-caliber  =  maximum 
body  diameter),  sharp,  secant-ogive  nose,  a  two-caliber  cylindrical  mid- 
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section  and  a  one-caliber  7®  conical  afterbody  or  boattail.  A  similar  model 
was  used  for  the  computational  studies  with  the  only  difference  being  a  five 
percent  rounding  of  the  nose  tip.  The  nose  tip  rounding  was  done  for  computa¬ 
tional  efficiency  and  is  considered  to  have  little  impact  on  the  final 
integrated  forces.  Experimental  pressure  data^^  are  available  for  this  shape 
and  were  obtained  in  the  NASA  Langley  eight  foot  Pressure  Tunnel  using  a  sting 
mounted  model.  The  test  conditions  of  1  atm  supply  pressure  and  320  K  supply 
temperature  resulted  in  a  Reynolds  number  of  4.5  x  lo^  based  on  model  length. 

The  computational  grid  used  for  this  computation  is  shown  in  Figure  3. 
Figure  3a  shows  the  longitudinal  cross  section  of  the  3D  grid  while  Figure  3b 
shows  an  expanded  view  of  the  three-dimensional  base  region  grid.  As  shown  in 
Figure  3a,  the  clustering  of  grid  points  near  the  body  surface  is  done  to 
resolve  the  viscous  boundary  layer  near  the  body  surface.  Grid  clustering  has 
also  been  used  in  the  longitudinal  direction  near  the  boattail  and  the  base 
corners  where  large  gradients  in  the  flow  variables  are  expected.  In  addi¬ 
tion,  the  composite  grid  scheme  preserves  the  sharp  base  corner.  The  grid 
consists  of  202  points  in  the  streamwise  direction,  36  points  in  the  circum¬ 
ferential  direction  and  50  points  in  the  normal  direction.  This  amounts  to 
about  16  million  words  of  storage  for  the  code  on  the  Cray  X-MP/48.  Only  up 
to  4  Mw  of  central  core  memory  was  easily  accessible;  therefore,  the  full  grid 
was  partitioned  into  five  smaller  grids  (including  a  base  region  grid)  each  of 
which  would  use  the  core  memory  in  turn  while  the  rest  is  stored  on  the  SSD 
device.  These  computations  were  performed  on  the  Cray  X-MP/48  at  the  US  Army 
Ballistic  Research  Laboratory  (BRL).  Each  numerical  simulation  (includes  all 
partitioned  grids)  took  over  20  hours  of  computer  time. 

Another  grid  shown  in  Figure  4  was  obtained  to  simulate  the  model  with  the 
sting  in  the  base  region  again  for  the  SOCBT  projectile.  This  is  again  a 
longitudinal  cross-section  of  the  3-D  grid.  The  grid  is  wrapped  around  the 
base  corner  in  this  case.  It  consists  of  238  points  in  the  axial  direction, 
39  points  in  the  circumferential  direction  and  50  points  in  the  normal  direc¬ 
tion.  Computations  on  this  grid  were  performed  on  the  Cray  2  computer  at  BRL 
using  the  same  code.  The  computing  time  for  these  simulations  was  comparable 
to  that  on  the  Cray  X-MP/48. 

The  second  projectile  under  consideration  is  the  M549  projectile  shown  in 
Figure  5.  This  projectile  has  a  short  boattail  of  about  1/2  a  caliber  in 
length.  For  simplicity,  the  flat  nose  was  again  modeled  with  nose  tip  round¬ 
ing  and  the  rotating  band  was  eliminated.  Experimental  aerodynamic  coeffi¬ 
cient  data  are  available  for  this  configuration  which  is  a  compilation  of  the 
wind  tunnel  and  free  flight  range  data. Computations  for  this  projectile 
have  been  made  for  atmospheric  conditions.  Figure  6  shows  an  expanded  view  of 
the  grid  around  this  projectile  and  shows  both  the  wind  side  and  lee  side 
planes.  The  full  grid  consists  of  298  points  in  the  axial  direction,  39 
points  in  the  circumferential  direction  and  50  points  in  the  normal  direction. 
Calculations  for  this  projectile  were  performed  on  the  Cray  2  computer  at  BRL. 
Each  of  these  calculations  took  over  30  hours  of  computer  time. 


IV.  RESULTS 

The  implicit  time  marching  procedure  was  used  to  obtain  the  desired  steady 
state  result  starting  from  initial  freestream  conditions  everywhere.  Boundary 


conditions  were  updated  explicitly  at  each  time  step.  The  solution  residual 
dropped  at  least  three  orders  of  magnitude  before  converged  solutions  were 
obtained.  In  addition,  the  surface  pressure  distribution  was  checked  for  time 
invariance.  For  the  computation  of  turbulent  flow,  a  turbulence  model  must  be 
supplied.  In  the  present  calculation,  a  two  layer  algebraic  eddy  viscosity 
model  due  to  Baldwin  and  Lomax was  used.  Results  are  now  presented  for  two 
cases:  (1)  SOCBT  projectile  with  and  without  sting;  and  (2)  M549  projectile. 

1.  SOCBT  PROJECTILE,  0.9  <  <  1.2,  a  =  4 

Results  have  been  obtained  at  various  transonic  speeds  for  both  cases  with 
and  without  modeling  of  the  sting.  Figures  7-10  show  the  Mach  contours  for 
the  projectile  in  the  windward  and  leeward  planes.  These  figures  show  the 
expansions  at  the  ogive-cylinder  and  cylinder-boattai 1  corners.  These  figures 
indicate  the  presence  of  shock  waves  on  the  cylinder  and  also  on  the  boattail 
which  typically  occur  on  the  projectile  at  transonic  speeds.  Sharp  shocks  are 
clearly  observed  on  the  boattail  flowfield  which  are  asymmetrically  located 
(the  one  on  the  wind  side  being  closer  to  the  base  than  its  counterpart  on  the 
lee  side).  The  asymmetry  can  also  be  seen  in  the  wake  flow  behind  the  bluff 
base.  As  the  Mach  number  is  increased  from  0.94  to  0.96  and  then  to  0.98,  the 
shocks  become  stronger  and  move  towards  the  base  of  the  projectile.  At  higher 
transonic  speeds  past  the  speed  of  sound  (see  Figure  10),  these  shocks  become 
weak;  however,  a  bow  shock  forms  in  front  of  the  nose  of  the  projectile. 

Computations  have  also  been  made  to  investigate  the  effect  of  the  sting  on 
transonic  projectile  aerodynamics.  A  typical  plot  of  Mach  contours  for  this 
simulation  is  shown  in  Figure  11a  for  M^  «  0.96  and  a  »  4“.  As  expected,  the 
sting  has  a  large  effect  on  the  qualitative  features  of  the  wakefield  which  in 
turn  has  moved  the  boattail  shocks  further  upstream  from  the  base  corner. 
Experimentally  obtained  shadowgraph  picture  at  the  same  Mach  number  and  flow 
conditions  is  shown  in  Figure  lib.  As  shown  in  Figures  11a  and  11b,  the 
agreement  of  the  shock  wave  positions  between  the  computation  and  experiment 
is  very  good.  Figures  12a  and  12b  shows  the  velocity  vectors  in  the  base 
region  for  both  windside  and  leeside.  Figure  12a  is  for  the  case  with  no 
sting  whereas  Figure  12b  includes  the  sting  in  the  base  region.  In  both 
cases,  asymmetry  in  the  flowfield  can  be  observed  between  the  windside  and 
leeside.  Three  pairs  of  separated  flow  bubbles  can  be  seen  in  the  near  wake 
for  the  case  of  no  sting  (Figure  12a).  For  the  case  with  sting  (Figure  12b), 
one  can  see  the  large  primary  bubble  along  with  a  counter  rotating  small 
bubble  near  the  junction  of  sting  and  the  base.  The  primary  bubble  is  more 
elongated  on  the  windside  and  the  flow  reattaches  further  downstream  of  the 
base. 

Figures  13-15  show  the  surface  pressure  distributions  as  a  function  of  the 
longitudinal  position  and  are  compared  with  experimental  data.^^  Figures  13a 
and  13b  show  the  comparison  at  M^  =  0.96  for  windside  and  leeside,  respective¬ 
ly.  Computed  results  are  shown  for  two  grids,  one  which  wrapped  around  the 
base  corner  and  the  other  which  did  not.  As  shown  in  these  figures,  the 
computed  results  are  virtually  the  same  for  both  computations  except  near  the 
base  corner  where  a  small  difference  can  be  noticed.  The  agreement  with 
experimental  data  however,  is  very  good  for  both  windside  and  leeside.  The 
expansions  and  recompressions  near  the  ogive-cylinder  and  cylinder-boattai 1 
junctions  are  captured  very  accurately.  Figures  14a  and  14b  shows  the  surface 
pressure  distribution  for  M^  =  0.98.  Computed  results  are  shown  for  both 
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cases  with  and  without  sting  for  turbulent  flow.  In  the  experiment, 
model  was  sting  mounted  and  no  boundary  layer  trip  was  used.  Therefore,  it  is 
not  clear  if  the  flow  was  laminar  or  turbulent.  Computed  results  were 
obtained  for  the  sting  mounted  case  for  laminar  flow  condition  and  is  also 
included  in  the  comparison  with  experimental  data.  Comparison  of  pressure  on 
the  windside  (Figure  14a)  shows  generally  good  agreement  of  the  computed 
pressures  with  the  experimental  data.  The  largest  differences  between  the 
computed  results  are  seen  on  the  rear  part  of  the  boattail  where  no  experi¬ 
mental  results  are  available.  The  comparison  on  the  leeside  (Figure  14b) 

again  shows  good  agreement  of  the  computed  results  with  experimental  data  for 
most  of  the  projectile  except  on  the  second  half  of  the  boattail.  As 
expected,  the  computed  result  with  no  sting  has  the  largest  discrepancy. 
Computed  results  with  sting  simulation  conpare  well  with  experimental  data 
especially  for  laminar  flow  conditions.  A  typical  result  at  a  high  transonic 
speed  =  1.1  is  shown  in  Figure  15.  The  agreement  of  the  computed  surface 
pressures  with  experiment  is  very  good.  At  this  high  transonic  Mach  number, 
the  shocks  on  the  cylinder  as  well  as  on  the  boattail  are  very  weak  as 
evidenced  by  the  absence  of  sharp  rise  in  pressure  in  those  areas.  The  expan¬ 
sions  and  recompressions  near  the  ogive-cylinder  and  cyl inder-boattail  junc¬ 
tions  can  be  clearly  observed  in  Figure  15. 

The  computed  surface  pressures  have  been  integrated  to  obtain  the  aerody¬ 
namic  forces  and  moments.  The  slope  of  the  pitching  moment  coefficient  (C^^  ) 

a 

is  generally  of  greater  concern  in  projectile  aerodynamics  since  it  is  the 
parameter  that  determines  the  static  stability  of  the  projectile.  Figure  16 
shows  the  variation  of  the  slope  of  the  pitching  moment  coefficient  with  Mach 
number.  It  clearly  shows  the  critical  aerodynamic  behavior  in  the  transonic 
speed  regime,  i.e.,  the  sharp  rise  in  the  coefficient  between  M  =  0.92  and 
0.96  and  its  subsequent  sharp  drop.  This  is  followed  by  a  smooth  decrease  in 
the  coefficient  as  Mach  number  is  increased  further.  The  increase  in  C 

m 

a 

between  M  =  0.92  and  0.96  is  of  the  order  of  20%  which  is  a  typical  value 
obtained  from  a  number  of  range  tests  for  similar  projectiles. 

2.  M549  PROJECTILE,  0.7  <  M.  <  1.5,  a  =  2 

Numerical  computations  were  made  for  the  M549  projectile  at  various 
transonic  speeds  0.7  <  M_^  <  1.5  and  at  angle  of  attack,  a  =  2°.  Qualitative 
features  of  the  flowfield  obtained  from  some  of  these  calculations  are  shown 
in  Figures  17-21  where  Mach  number  contours  have  been  plotted  for  M  =  0.85, 
0.90,  0.92,  0.94,  and  0.98  for  both  windward  and  leeward  planes.  The  asym¬ 
metry  in  the  wake  region  flow  is  obvious  from  these  figures.  These  figures 
indicate  the  development  and  asymmetric  locations  of  shock  waves  on  the 
projectile  at  transonic  speeds.  At  low  transonic  speeds,  for  example,  at  M  = 
0.85  (Figure  17)  the  shock  waves  are  just  beginning  to  form  especially  near 
the  boattail  junction.  As  Mach  number  is  increased  to  M  =  0.90,  the  shocks 
are  already  formed  on  the  projectile  both  near  the  cylinder  as  well  as  boat- 
tail  junctions.  The  flow  expansions  at  these  junctions  can  also  be  clearly 
seen  in  this  figure.  The  small  asymmetry  in  shockwave  locations  can  be 
observed  particularly  with  the  boattail  shocks.  The  shockwave  on  the  boattail 
in  the  windside  is  a  little  closer  to  the  base  than  its  counterpart  in  the 
leeside.  In  addition,  these  shocks  have  moved  little  downstream  from  the 
boattail  junction.  As  shown  in  Figures  19-21,  witn  further  increase  in  Mach 
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number  to  0.92,  0.94,  and  0.98,  the  shockwaves  (both  on  the  cylinder  and  the 
boattail)  become  stronger  and  gradually  move  downstream.  The  asymmetry  in  the 
location  of  the  shock  waves  can  be  seen  more  and  more  clearly.  As  seen  in 
Figure  21  for  M  =  0.98,  the  shockwaves  pattern  is  complicated  and  the  boattail 
shocks  are  located  very  close  to  the  base  corners. 

The  static  aerodynamic  coefficients  have  been  obtained  from  the  computed 
flowfields.  As  pointed  out  earlier,  the  slope  of  moment  coefficient  (C^  )  is 

a 

of  greater  concern.  Figure  22  shows  the  development  of  over  the  projectile 

for  various  transonic  speeds.  Actually,  it  is  the  accumulative  moment  coeffi¬ 
cient  referenced  to  the  nose  and  thus,  the  value  at  the  end  (X/D  =  5.645)  is 
the  final  result.  The  difference  in  this  coefficient  over  the  nose  portion  is 
practically  negligible  for  all  transonic  Mach  numbers.  The  largest  effect  is 
seen  on  the  cylinder  and  boattail  sections.  The  boattail  has  a  dramatic 
effect  as  evidenced  by  the  sharp  rise  in  all  the  curves.  Figure  23  shows  the 
Cjjj  comparison  for  the  computation  and  the  experimental  data.^®  Here  is 

referenced  to  center  of  gravity  (C.6.)  of  the  projectile.  One  can  clearly  see 
the  sharp  rise  in  between  M  =  0.8  to  0.94  which  is  followed  by  the  sharp 

drop  with  further  increase  in  Mach  number  in  both  the  computation  and  the 
experimental  data.  This  critical  aerodynamic  behavior  observed  in  the  experi¬ 
mental  data  is  clearly  predicted  in  the  computations  and  good  agreement  has 
been  obtained  between  the  computed  result  and  the  data. 


V.  CONCLUDING  REMARKS 

In  conjunction  with  a  new  Navier-Stokes  code,  a  simple  composite  grid 
scheme  has  been  developed  which  allows  fine  computational  grids  needed  for 
accurate  transonic  flow  computations  to  be  obtained  on  CRAY  X-MP/48  or  Cray  2 
computers.  The  numerical  method  uses  an  implicit,  approximately  factored, 
partially  upwind  (flux-split)  algorithm. 

The  three  dimensional  transonic  flowfield  computations  have  been  made  over 
two  projectiles  for  different  flow  conditions  and  angle  of  attack.  The 
computed  flowfields  show  the  development  of  the  asymmetrically  located  shock- 
waves  on  the  projectile  at  various  transonic  speeds.  For  the  SOCBT  projec¬ 
tile,  computed  surface  pressures  have  been  compared  with  experimental  data  and 
are  found  to  be  in  good  agreement.  The  slope  of  the  pitching  moment  coeffi¬ 
cient  (C|^  ),  determined  from  the  computed  flowfields,  shows  the  critical  aero- 
o 

dynamic  behavior.  For  M549  projectile,  computed  has  been  compared  with 

a 

experimental  data.  It  shows  the  same  critical  behavior  in  the  data  and  the 
agreement  between  the  computed  result  and  experimental  data  is  good. 

The  results  of  this  research  provide  the  basis  for  a  new  capability  to 
compute  three  dimensional  transonic  flowfields  over  projectiles.  This  capa¬ 
bility  in  conjunction  with  the  supercomputers  at  the  US  Army  Ballistic 
Research  Laboratory  has  led  to  the  first  successful  prediction  of  the  critical 
aerodynamic  behavior  in  of  artillery  shell  at  transonic  speeds.  The  next 

a 
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step  is  the  numerical  prediction  of  magnus  force  and  moment  for  spinning 
projectiles  at  angle  of  attack  which  involves  calculations  of  the  full  three 
dimensional  transonic  flowfields. 
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Figure  3a.  Longitudinal  cross-section  of  the  30  grid 


Figure  3b.  Expanded  view  of  the  base  region 
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Figure  5.  MS 49  projectile. 


Figure  7.  Mach  contours,  SOCBT  projectile,  M^  =  0.94,  a  * 


Figure  11a.  Computed  Mach  contours,  =  0,96,  a  *  4® 
_ SQCBT  projectile  (with  sting). _ 


i 

Figure  11b.  Experimental  shadwgraph,  M^  =  0.96,  a  =  4®, 
_ SQCBT  projectile  (with  sting). _ 


1092 


093 


X/D 


Figure  13a.  Longitudinal  surface  pressure  distribution,  S0C8T  projectile, 
_ =  0.96,  g  =  4**,  windside. _ 
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Figure  13b.  Longitudinal  surface  pressure  distribution,  SOCBT  projectile, 

M  =  0.96,  a  =  4“,  leeside. 

_ « _ _  _  « 
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Figure  14a.  Longitudinal  surface  pressure  distribution,  SOCBT  projectile 
_ =  0.98,  ct  g  4°,  windside. _ 
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Figure  14b.  Longitudinal  surface  pressure  distribution,  SOCBT  projectile 

M  =  0.98,  a  =  4",  leeside. 
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DYNAMIC  RESPONSE  OF  RECTANGULAR  STEEL  PLATES  OBLIQUELY  IMPACTING  A 

RIGID  TARGET 


Aaron  Das  Gupta 

US  Army  Ballistic  Research  Laboratory,  US  Army  Laboratory  Command 
Aberdeen  Proving  Ground,  Maryland  21005 


ABSTRACT.  Dynamic  response  of  rectangular  steel  plates  of  two  different 
thicknesses  obliquely  impacting  a  rigid  semi-infinite  target  was  modeled 
using  the  STEALTH-3D  hydrodynamic  code  based  on  an  explicit  Lagrangian  finite- 
difference  formulation  for  solids,  structural  and  thermo-hydraulic  analysis. 
Motivation  for  this  analysis  arises  from  the  need  to  assess  transient  loads 
and  deformation  in  the  contact  zone  due  to  impact  of  plates  of  various 
materials,  geometry,  initial  velocity  and  angle  of  obliquity  in  order  to 
assure  structural  integrity  and  avoid  premature  failure.  The  STEALTH  code  was 
developed  by  Science  Applications  Inc.  and  a  DOD  solids  and  structural 
version  without  the  thermo-hydraulic  analysis  capability  has  been  employed 
for  this  investigation. 

The  plates  were  impulsively  driven  to  a  high  velocity  prior  to  oblique 
impact  upon  a  rigid  wall  and  were  modeled  using  a  three-dimensional  11* 11*3 
mesh  configuration.  Only  one-half  of  the  plates  were  modeled  by  virtue  of 
their  lateral  symmetry.  The  Mie-Gruneisen  equation  of  state  for  steel  under 
hydrostatic  compression  and  elastic  perfectly-plastic  behavior  for  deviatoric 
material  strength  were  included  for  the  material  model.  Initially  a  very 
small  time  step  two  orders  of  magnitude  below  the  Courant  stability  criteria 
for  the  smallest  mesh  was  used  to  stabilize  the  explicit  integration 
calculations  near  the  impacted  region. 

The  results  indicate  large  hydrostatic  pressure  rise  at  the  initial 
contact  zone  resulting  in  severe  elasto-plastic  deformation  and  plate  bending 
causing  separation  of  the  leading  edge  while  the  trailing  zone  contacts  the 
rigid  surface  as  the  plate  continues  to  slide  along  the  rigid  'all  for  the 
relatively  thick  plate  impact  problem.  For  the  thin  plate,  occurrence  of  a 
plastic  hinge,  localized  bending  near  the  leading  edge  and  sliding  along  the 
target  surface  are  observed.  Hourglassing  instabilty  along  the  boundary  did 
not  adversely  affect  computation  near  the  leading  edge  and  a  major  portion  of 
the  impulse  occurred  during  the  initial  40-50  microseconds.  The  computations 
indicate  that  impact  loading  upon  the  wall  can  be  accurately  estimated  using  a 
refined  mesh  near  the  leading  edge  of  the  plate. 


1.  INTRODUCTION.  The  capability  to  predict  the  effect  of  hypervelocity 
plate  impact  on  a  rigid  structure  is  a  necessity  as  a  first  step  towards  the 
design  and  safe  operation  of  protective  enclosures  (1,2)  commonly  used  in 
nuclear  power  plants.  This  problem  is  also  of  interest  to  the  Ballistic 
Research  Laboratory  due  to  the  possibility  of  fragment  induced  damage  (3)  to 
target  enclosures  in  the  terminal  ballistics  test  facilities  which  might 
result  in  catastrophic  rupture  when  the  blast  loading  is  applied. 

A  number  of  studies  have  been  performed  and  damage  data  obtained  (4-7) 
over  the  years.  However,  most  data  available  are  in  the  form  of  impulse 
correlation  curves  and  crater  shapes  in  plates  due  to  slender  rods  while 
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relatively  little  has  been  reported  in  plate  impact  upon  walls  involving 
contact,  sliding  and  seperation. 


Recently,  computation  using  hydrodynamic  codes  (8-10)  has  been  reported. 
This  paper  documents  numerical  studies  conducted  using  the  explicit  finite- 
difference  computer  code,  STEALTH  (11)  to  simulate  and  predict  the  transient 
nonlinear  behavior  resulting  from  the  oblique  impact  of  a  plate  on  a  rigid 
surface.  The  goals  of  this  numerical  simulation  were  to  aid  in  understanding 
the  process  by  which  fragments  Impact  and  deform  prior  to  seperation  as  well 
as  to  demonstrate  the  applicability  of  the  code  in  obtaining  loading  functions 
in  the  contact  region. 


2.  IMPACT  CONDITION.  The  flyer  plate  is  assumed  to  be  a  60  cm  •  35  cm  • 
2.5U  cm  rectangular  steel  plate  impacting  a  rigid  surface  at  an  inclination  of 
20  degrees  to  the  horizontal  surface  for  the  thick  plate  problem.  The  plate  is 
assumed  to  be  accelerated  to  a  constant  velocity  of  200  m/s  prior  to  impact. 

For  the  second  case  Involving  the  thin  plate  Impact  problem  the  plate 
which  is  only  1.6  mm  in  thickness.  Impacts  a  rigid  surface  at  an  angle  of 
obliquity  of  60  degrees.  The  plate  is  assumed  to  be  31  cm  in  length  auid  22 
cm  in  width.  The  initial  contact  occurs  along  the  length  of  the  plate  at  the 
bottom  edge.  The  plate  is  assumed  to  have  an  initial  velocity  of  900  m/s 
prior  to  impact. 


3.  MATERIAL  MODEL.  Computation  of  stresses  and  strains  in  the  STEALTH 
code  Involves  calculation  of  devlatoric  components  which  depend  upon  shear 
strength  characteristics  as  well  as  hydrostatic  components  which  are  governed 
by  high  pressure  equation  of  state  of  the  deformable  material . 

For  the  thick  plate  problem  quasi-static  properties  of  mild  steel  and 
an  elastic  perfectly-plastic  representation  of  the  constitutive  relationship 
was  employed  for  the  devlatoric  strength  of  the  plate  material.  This  model 
is  available  in  the  standard  material  library  in  the  computational  code. 

The  yield  strength  was  3.0  KBar  and  the  shear  modulus  was  approximately 
0.62  MBar.  The  shear  modulus  G  was  calculated  from  the  Young’s  modulus  E 
using  the  relationship 

G  =  E  /{2  (1  +  v)) 

where  v  is  Poisson's  ratio  which  was  found  to  be  0.29. 

For  the  hydrostatic  compression,  a  modified  form  of  the  Mie-Gruneisen 
equation  of  state  for  shock  propagation  in  solids  was  available  in  the  code 
and  could  be  described  as 

P(>*,E)  =  b/+  C/^  +  U(D  +  Fm+H#**) 

where  A,  B,  C  and  D,  F,  H  are  material  constants  determined  experimentally 
from  Hugoniot  pressure-volume  states  obtained  in  shock  transitions  and  m  is  a 
ratio  of  the  specific  volume  change  and  the  initial  volume.  U  is  the  internal 
energy  density.  Material  parameters  are  available  in  the  standard  material 
library  in  the  STEALTH  code  for  a  variety  of  solids. 

The  bulk  modulus,  K,  was  calculated  from  the  relationship 
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K  =  E  /  [3  (1  -  2  v)] 


The  shear  modulus,  G,  is  related  to  the  bulk  modulus  for  both  loading  and 
unloading  phases  as 

G  r  3  K  (1-  2  v)  /  [2  (1  +  v)] 

The  sound  speed,  c,  in  the  parent  material  could  be  obtained  from  the 
deviatoric  and  hydrostatic  compression  behavior  as 

c  =  A  +  2BAt  +  3C,iK  U  (F  +  2H/^  )  +  (P/V^)(D  +  F/u.  +  H/n*)  +  1.333  G 

where  V  is  specific  volume  and  P  is  the  hydrostatic  pressure. 

Artificial  viscosity  parameters  for  both  linear  and  quadratic  damping 
were  employed  to  damp  out  spurious  oscillations.  However,  zero  energy  modes 
due  to  hourglassing  at  the  boundary  could  not  be  damped  out  due  to  lack  of 
an  appropriate  tensorial  hourglass  viscosity  parameter.  This  was  not 
considered  to  be  a  significant  problem  since  the  instability  did  not  appear 
to  grow  with  time  and  did  not  seem  to  affect  the  contact  zone  near  the 
leading  edge. 

In  the  second  case  for  the  thin  plate  impact  problem  an  AISI  304  grade 
steel  already  available  in  the  standard  material  library  was  considered. 

For  the  deviatoric  strength  an  elastic  strain-hardening  model  representation 
was  used.  The  yield  strength  was  20.0  KBar  and  a  hardening  exponent  of  0.035 
was  employed.  The  shear  modulus  was  0.77  MBar  while  the  spallation  threshhold 
was  -0.02.  The  model  also  included  thermal  softening  capability. 

For  the  hydrostatic  compression  part  a  Gruneisen  volume  coefficient  of 
1.4753  and  an  energy  coefficient  of  2.17  were  employed  in  the  Mie-Gruneisen 
Equation  of  State  for  the  304  Grade  steel  as  opposed  to  null  values  used  in 
the  earlier  case  for  the  low  carbon  steel.  Additionally,  a  hardening 
coefficient  of  40.0  and  a  corresponding  hardening  exponent  of  0.35  was  used 
to  model  the  strain  hardening  part  of  the  deviatoric  strength  behaviour  in 
contrast  with  perfectly-plastic  assumptions  in  the  thick  plate  problem.  The 
initial  bulk  modulus  was  approximately  1.648  MBar. 


4.  COMPUTATIONAL  ALGORITHM.  The  STEALTH  code  was  used  to  simulate  the 
dynamic  response  due  to  impact  in  both  cases.  The  code  (11,12)  solves  the 
partial  differential  equations  of  continuum  mechanics  using  an  explicit 
finite-difference  method  formulated  in  a  Lagrangian  moving  coordinate  frame. 

In  the  Lagrange  system,  fixed  mass  units  translate,  rotate,  compress, 
expand  and  distort.  Momentum  is  associated  with  the  motion  of  the  mass  and 
internal  energy  is  fixed  to  the  mass  unit.  The  STEALTH  solutions  are  second- 
order  accurate  in  space  and  time.  A  complete  description  of  the  Lagrangian 
equations  solved  by  the  STEALTH  code  is  given  in  the  user's  manual  (11). 

Several  rezoning  options  are  available  in  the  program  for  updating  grid 
point  locations  and  variables  in  case  of  large  mesh  distortion  or  grid 
entanglement.  Pressure  discontinuities  are  handled  by  smearing  out  the  dis¬ 
continuity  with  a  von  Neumann  quadratic  artificial  viscosity.  Zone  to  zone 
oscillations  are  damped  out  by  means  of  a  linear  artificial  viscosity. 
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stability  of  the  differential  equations  is  automatically  regulated  by  the 
Courant  stability  criterion  which  can  be  described  as 


Ate  ALmln 

At  =  -  =  . . . . 

n  nVEymax/p 

where  Ate  is  the  minimum  Courant  stability  step  size, 

ALmin  is  the  distance  of  the  two  closest  mesh  points  in  the  system, 

Eymax  is  the  Young's  modulus  for  the  stiffest  material, 

p  is  the  density  of  the  material,  and 

n  is  the  number  of  time  steps  with  which  we  wish  to  represent  the 

shock  wave  in  passing  through  the  distance  AL 


5.  NUMERICAL  MODELING.  In'  both  cases  only  one-half  of  each  plate  was 
modeled  by  virtue  of  its  lateral  symmetry.  For  the  first  problem  the  generic 
three-dimensional  computational  model  has  1 1  meshes  along  the  length  and  an 
equal  number  along  the  width  as  well  as  3  rows  of  mesh  through  the  thickness 
of  the  flyer  plate  segment.  An  isometric  view  of  initial  configuration  of  the 
undeformed  mesh  is  shown  in  Figure  1. 

In  the  second  case  the  entire  plate  was  modeled  using  a  somewhat  refined 
mesh  with  14  rows  of  mesh  along  length  and  width  as  well  as  4  rows  through 
the  thickness  of  0.16  cm.  Because  of  this  narrow  thickness  an  initial  time 
step  an  order  of  magnitude  lower  than  in  the  previous  case  was  required  in 
order  to  avoid  a  violation  of  the  stability  criteria  in  the  explicit  integ¬ 
ration  scheme  at  the  outset.  A  refined  mesh  in  the  contact  region  is  expected 
to  result  in  a  more  accurate  description  of  the  impact  forces.  However,  a 
very  small  time  step  approximately  1%  of  the  wave  speed  transit  time  to 
traverse  the  smallest  mesh  was  needed  to  stabilze  the  computation  in  the 
impacted  region. 

The  computational  procedure  used  in  modeling  the  angular  impact  process 
can  be  summarized  as  follows  : 

a.  The  rigid  surface  acts  as  a  fixed  boundary  for  the  finite-difference  grid. 

b.  The  forces  acting  on  the  rigid  surface  in  the  impacted  region  is  calculated 
by  STEALTH  from  the  stresses  developed  in  the  deformable  plate  and  summed  to 
give  the  resultant  cell  averaged  contact  pressure.  Frictionless  contact  is 
assumed  between  the  plate  and  the  rigid  surface. 

c.  Additionally  STEALTH  computes  the  new  grid  point  positions  due  to  sliding 
upon  impact.  These  updated  locations  are  then  used  as  input  for  the  next 
computational  cycle.  No  rezoning  is  used  in  this  calculation. 


6.  DYNAMIC  RESPONSE  COMPUTATION.  The  entire  bottom  and  front  surfaces 
of  the  plate  was  designated  as  a  wall  interaction  boundary  to  allow  initial 
contact  and  subsequent  interaction  with  the  rigid  surface.  Dynamic  response 
calculations  were  initially  performed  for  40  microseconds  upon  impact  and  was 
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later  extended  upto  0.3  ms  to  monitor  the  post-impact  response  which  should 
show  sliding  and  separation  in  addition  to  transverse  bending  effects. 

The  dynamic  response  of  the  relatively  thick  plate  is  shown  in  Figures 
2a-2e  which  describes  the  behavior  from  10-300  microseconds  after  impact. 
After  initial  contact  along  the  leading  edge,  surface  contact  at  the  bottom 
frontal  element  is  visible  in  Figure  2a  at  an  elapsed  time  of  10 
microseconds  signalling  the  onset  of  bending.  Significant  amount  of  bending 
is  visible  in  Figure  2b  at  125  microseconds  beyond  impact  at  the  first  two 
rows  of  elements  at  the  forward  end  of  the  plate  while  the  contact  area  has 
increased  beyond  the  first  set  of  elements  at  the  bottom  surface. 

At  an  increased  response  time  of  200  microseconds  after  impact  bending 
appears  to  propagate  and  affect  the  first  three  rows  of  elements  while  the 
contact  area  has  progressed  to  the  first  twcf  rows  of  elements  at  the  forward 
bottom  location  of  the  plate  as  shown  in  Figure  2c.  Initial  compression  of 
the  leading  edge  resulting  in  development  of  compressive  stresses  near  the 
forward  end  are  significantly  altered  by  the  onset  of  bending  causing 
compressive  stresses  in  the  top  fibers  and  tensile  stresses  in  the  bottom 
fibers. 

With  further  increase  in  response  times  bending  stress  wave  propagates 
towards  the  rear  of  the  plate.  At  250  microseconds  lifting  of  the  forward 
edge  and  separation  from  the  rigid  surface  are  indicated  as  shown  in  Figure 
2d.  This  is  accompanied  by  a  backward  shift  of  the  contact  region  as  the 
third  row  of  elements  at  the  bottom  comes  in  contact  with  the  rigid  wall 
boundary.  Bending  has  now  progressed  beyond  the  first  three  rows  of  elements 
at  the  forward  end. 

At  an  extended  response  time  of  300  microseconds  the  post-impact 
process  continues  causing  further  separation  and  upward  bending  of  the 
leading  edge  and  the  forward  end  while  the  contact  zone  shifts  backward 
indicating  partial  contact  of  the  second  and  fourth  rows  of  elements  at  the 
bottom  surface  and  complete  contact  with  the  third  rows  of  elements  as  shown 
in  Figure  2e.  This  process  of  deformation  is  realistic  in  the  sense  that 
significant  bending  and  separation  of  the  forward  end  is  expected  at  a 
shallow  angle  of  attack. 

For  the  thin  plate  problem  dynamic  response  studies  were  conducted 
upto  700  cycles  corresponding  to  an  elapsed  time  of  40  m'icroseconds  only. 

This  is  because  the  automatically  adjusted  time  steps  were  an  order  of 
magnitude  lower  than  those  for  the  previous  case  due  to  very  small  thickness 
of  the  plate  and  the  use  of  a  refined  mesh  scheme  for  this  model. 

Typical  deformation  patterns  at  20  and  40  microseconds  are  shown  in 
in  Figures  3a  and  3b  respectively.  Due  to  lack  of  sufficient  resolution  the 
grid  appears  as  a  dark  band  in  the  end  view.  The  onset  of  bending  is  clearly 
visible  in  Figure  3a.  Comparison  of  the  two  figures  indicate  sliding  along 
the  rigid  surface  in  a  direction  along  the  horizontal  component  of  the 
initial  velocity  vector.  This  is  expected  since  a  zero  friction  coefficient 
has  been  imposed  along  the  interacting  boundary.  Additionally,  evidence  of 
thin  plate  buckling  and  formation  of  a  plastic  hinge  approximately  two  mesh 
points  away  from  the  leading  edge  can  be  observed  in  Figure  3b.  These 
phenomena  create  severe  compression  of  elements  near  the  leading  edge  along 
the  thickness  direction  requiring  further  drop  in  the  allowable  computational 
time  step  to  avoid  instability  problem.  At  longer  time  steps  spurious 
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oscillations  throughout  the  plate  and  grid  instability  is  observed  due  to 
hourglassing  and  consequent  zero  energy  modes  which  may  be  controlled  by  a 
tensor-triangle  artificial  viscosity. 

A  typical  forcing  function  for  the  thick  plate  problem  generated  by  the 
STEALTH  code  is  shown  in  Figure  4.  The  forcing  function  is  computed  as  a  cell 
averaged  pressure  at  the  center  of  the  bottom  mesh  near  the  leading  edge 
where  initial  contact  occurs.  Since  the  contact  zone  propagates  along  this 
bottom  mesh  surface,  the  cell  averaged  pressure  can  be  a  reasonably 
acceptable  measure  of  the  impact  load  in  the  contact  zone.  The  accuracy  of 
the  measure  in  representing  the  actual  contact  load  can  be  improved  upon  by 
refining  the  grid  particularly  near  the  impacted  region  of  the  plate. 

As  depicted  in  Figure  4  the  impact  pressure  has  a  rather  steep  climb  and 
a  sharp  peak  of  2.5  KBar  at  an  early  time  of  4.0  microseconds  after  impact. 
This  is  followed  by  an  equally  steep  drop  between  4.0  and  12.0  microseconds 
unloading  to  zero.  Subequently  the  plate  reloads  to  approximately  6.0  KBar 
at  46.0  microseconds  and  oscillates  about  the  3.0  KBar  level  for  a  rather 
extended  period  of  time  beyond  46.0  microseconds.  This  wringing  behavior  of 
the  plate  is  probably  due  to  reflection  of  stress  waves  from  the  top  and 
bottom  surfaces  near  the  forward  end  of  the  plate  and  it  appears  to  gradually 
decay  in  amplitude  with  time.  Beyond  40  microseconds  seperation  of  the 
leading  edge  and  the  forward  bottom  mesh  from  the  rigid  surface  causes  a  drop 
in  the  impact  load  to  the  fully  unloaded  level. 

The  contact  pressure-time  history  due  to  oblique  impact  of  a  flyer  plate 
for  the  thin  plate  problem  shows  a  similar  trend  although  a  peak  pressure  of 
7.5  KBar  occurs  early  with  a  considerably  smaller  duration  and  the  subsequent 
oscillation  is  rather  noisy  due  to  propagation  of  stress  waves  through  the 
plate  material.  Hourglassing  instability  did  not  affect  the  contact  zone  ad¬ 
versely  and  the  contact  pressure  could  be  monitored  until  step  instability  or 
tip  seperation  takes  place.  However,  a  major  part  of  the  total  impulse  is 
contained  in  the  first  40-50  microseconds  and  further  computation  does  not 
contribute  significantly  to  the  forcing  function. 


7.  CONCLUSION.  Results  from  numerical  simulation  of  a  thick  as  well  as 
a  thin  plate  on  a  rigid  surface  are  presented.  Realistically  these  results  are 
conservative  in  the  sense  that  peak  pressures  and  deformation  from  impact 
on  a  rigid  surface  should  be  higher  than  those  due  to  impact  on  a  deformable 
surface.  If  the  forcing  function  from  a  nonresponding  surface  is  applied  to 
drive  a  responding  model  in  a  structural  response  code,  a  small  error  in 
terms  of  somewhat  higher  displacements  and  stress  levels  should  be  expected. 

In  some  cases  this  may  be  desirable  since  the  margin  of  safety  from  a 
structural  integrity  standpoint  could  be  enhanced  using  this  procedure. 

In  the  absence  of  any  experimental  data,  the  deformation  patterns 
ensure  increased  confidence  in  the  predicted  results  from  the  STEALTH  code 
which  yield  valuable  insight  into  the  post-impact  response  behavior  of 
plates.  The  complex  phenomena  of  sliding  and  seperation  are  demonstrated 
using  the  3-D  computational  model.  Inspite  of  simplistic  assumptions  of 
frictionless  sliding,  nonresponding  surface  and  quasi-static  idealized 
materials  data,  useful  data  for  plates  of  varying  thickness,  initial  velocity 
and  inclination  could  be  generated  in  a  cost-effective  and  efficient  manner 
using  the  STEALTH  code. 
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4:  Typical  conCacC  pressure  generated  by  STEALTH  code  for  the 
plate  impact  problem. 
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ABSTRACT.  We  discuss  mesh  moving  and  local  mesh  refinement  that  arc  used  with 
MacCormack's  scheme  to  solve  the  Euler  equations  for  inviscid  c  mpressible 
flow  in  two  space  dimensions.  A  coarse  base  mesh  of  quadrilateral  cells  is 
moved  by  an  algebraic  mesh  oiovemenC  function  so  as  to  follow  and  isolate 
distinct  phenomena.  The  local  mesh  refinement  algorithm  recursively  divides 
the  time  step  and  spatial  cells  of  the  moving  base  mesh  in  regions  where  the 
error  indicators  are  high.  A  mesh  generation  procedure  is  used  to  create  the 
initial  base  mesh.  MacCormack's  scheme  is  given  total  variation  diminishing 
(TVD)  artificial  viscosity  in  order  to  compute  shocks  and  discontinuities. 

The  time  step  is  adjusted  automatically  to  maintain  stable  computation  by 
calculating  the  maximum  eigenvalues  of  the  Euler  Equations  on  the  computationa 
mesh.  Results  are  presented  for  computational  examples  involving  planar  and 
cylindrical  blasts. 

1.  INTRODUCTION.  The  numerical  solution  of  the  Euler  equations  is  often 
difficult  because  the  nature,  location,  and  duration  of  fine-scale  structures 
is  often  not  known  in  advance.  Thus  calculation  on  a  uniform  or  prescribed 
mesh  can  fail  to  adequately  resolve  the  fine-scale  phenomena  or  have  excessive 
computational  costs.  Adaptive  mesh  procedures  that  evolve  with  the  solution 
offer  a  robust,  reliable,  and  efficient  alternative.  Such  techniques  for 
time-dependent  problems  are  either  capable  of  creating  finer  meshes  in  regions 
of  excessive  error  [l,2,3,4,5j  or  moving  meshes  to  follow  isolated  dynamic 
phenomena  11,2,5,6,7,8,9].  The  use  of  these  techniques  is  enhanced  when  they 


are  capable  of  providing  an  accurate  error  estimate  for  the  computed  solution 
[1,2,4,10,11,12]. 


Our  procedure  solves  the  two-dimensional  Euler  equations 
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on  a  rectangular  domain  Q  with  well-posed  initial  and  boundary  conditions. 
Here,  u  and  v  are  the  velocity  components  of  the  fluid  in  the  x  and  y 
directions,  p  is  the  fluid  density,  e  is  the  total  energy  of  the  fluid  per 
unit  volume,  and  p  is  the  fluid  pressure.  For  an  ideal  gas  the  equation  of 
state  is 

p  -  (Y  -  1)  [e  -  P(u2  +  v2)/2],  (2) 

where  Y  is  the  ratio  of  the  specific  heats. 

We  use  MacCormack's  [13]  explicit  finite  difference  scheme  with  Davis's 
[14]  artificial  viscosity  to  calculate  solutions  of  (1)  at  each  node  of  a 
moving  mesh  of  quadrilateral  cells.  We  use  a  density  switch  (gradient)  to 
to  estimate  error  at  the  nodes  of  the  mesh  and  a  procedure  to  select  the 
proper  time  step  size. 

Our  adaptive  mesh  algorithm  was  modified  from  a  general-purpose  scheme 
and  consists  of  three  main  parts  (i)  movement  of  a  coarse  base  mesh  (cf.  Arney 
and  Flaherty  [6]},  (ii)  local  refinement  of  the  base  mesh  in  regions  where  the 
resolution  is  inadequate  (cf.  Arney  and  Flaherty  [3]),  and  (iii)  regeneration 
of  the  base  mesh  when  it  becomes  too  distorted  and  unsuitable  for  further 
computation  (cf.  Arney  and  Flaherty  [15])-  Proper  mesh  motion  can  reduce 
errors;  however,  mesh  motion  alone  cannot  produce  solutions  that  satisfy 
prescribed  error  tolerances.  Therefore,  local  mesh  refinement  is  added  to 
recursively  solve  local  problems  in  regions  where  error  tolerances  are  not 
satisfied.  If  the  base  mesh  becomes  too  distorted  a  mesh  regeneration 
procedure  is  used  to  produce  a  better  base  mesh.  The  combination  of  our 
solution  method,  error  estimation,  and  adaptive  mesh  techniques  creates  a 
powerful  algorithm  since  the  solution  method  provides  robustness,  the  error 
estimation  and  mesh  refinement  provide  accuracy,  and  the  mesh  moving,  mesh 
regeneration,  and  time  step  selection  provide  efficiency. 

We  briefly  explain  the  various  parts  of  our  algorithm  in  Sections  2  and 
3.  The  results  of  their  use  on  an  example  problem  are  given  in  Section  4. 

The  status  of  our  algorithm  and  future  considerations  of  adaptive  methods  for 
the  Euler  equations  are  discussed  in  Section  5. 
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2.  SOLUTION  SCHEME  AND  ERROR  ESTIMATION  ON  A  MOVING  NONUNIFORM  MESH. 
HacCormack' s  scheme  has  had  wide  use  in  solving  Euler  equations.  The  use  of 
artificial  viscosity  to  make  this  scheme  total  variation  diminishing  (TVD) 
makes  it  more  attractive  to  solve  problems  with  discontinuities.  Our 
Richardson's  extrapolation-based  error  estimation  produces  a  pointwise 
approximation  of  the  local  discretization  error  which  can  be  used  to  construct 
global  or  local  measures  of  the  error. 

a.  MacCormack* a  Scheme.  In  order  to  discretize  (1)  on  a  moving  nonuniform 
mesh,  we  introduce  a  transformation 

C  “  C(x,y,t),  n  -  n(x,y,t),  T  -  t,  (3) 

from  the  physical  (x,y,t)  domain  to  the  computational  (5,h,T)  domain  where  a 
uniform  rectangular  grid  is  used.  Under  this  transformation  (I)  becomes 

ux  +  +  ^C^x  +  ^n’^x  h^y  *  &r\^y  “ 

The  two-step  MacCormack' s  scheme  [13]  then  uses  first-order  forward  temporal 
and  spatial  difference  approximations  in  the  predictor  step  and  first-order 
backward  differences  in  the  corrector  step  to  solve  (4).  Hindman  [16]  showed 
that  proper  differencing  of  this  chain-rule  form  (4)  with  its  metrics  produces 
a  consistent  approximation.  Arney  and  Flaherty  [10]  showed  that  conservation 
is  also  maintained.  We  have  also  used  the  finite-difference  method  of  Harten 
[17]  in  this  adaptive  mesh  algorithm.  However,  because  of  flux  splitting  the 
mesh  must  be  constrained  to  remain  rectangular.  This  constraint  limits  the 
benefits  of  mesh  moving  but  does  not  affect  the  mesh  refinement. 

The  explicit  MacCormack' s  scheme  has  a  stability  restriction  that  limits 
the  time  step  allowed  for  a  given  spatial  mesh.  For  efficient  computation,  we 
choose  the  next  time  step  adaptively  to  be  close  to  the  maximum  allowed  by  the 
Courant,  Friedrichs,  Lewy  theorem. 

b.  Davis ' s  Artificial  Viscosity.  MacCormack' s  scheme,  being  a  second- 
order  accurate  centered  scheme,  produces  spurious  oscillations  near 
discontinuities.  In  order  to  eliminate  or  reduce  these  oscillations, 
artificial  viscosity  is  added  to  the  solution  to  diffuse  the  discontinuity. 

We  use  an  artificial  viscosity  model  due  to  Davis  [14]  which  is  not  problem 
dependent  and  only  requires  knowledge  of  the  maximum  eigenvalues.  This 
artificial  viscosity  model  is  designed  to  convert  MacCormack' s  scheme  into  a 
TVD  scheme  in  one  dimension.  There  are  other  artificial  viscosity  models  for 
the  Euler  equations  that  produce  TVD  schemes  (cf.  Pulliam  [18]). 

Davis's  artificial  viscosity  is  based  on  a  flux  limiter  that  does  not 
depend  on  explicitly  determining  the  upwind  direction  and,  with  a  modification 
by  Roe  [19],  does  not  affect  the  region  of  stability  of  MacCormack' s  scheme. 
Because  MacCormack 's  scheme  does  not  determine  the  upwind  direction,  the 
combined  use  of  MacCormack's  scheme  and  Davis's  artificial  viscosity  is 
computationally  simple.  The  artificial  viscosity  terms  are  calculated  from 
the  solution  data  at  the  beginning  of  the  time  step.  The  maximum  absolute 
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eigenvalues  for  the  Euler  equations  on  the  computational  mesh  are  computed 
from  the  maximum  absolute  values  of 


and 


5t  + 
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+  TijjU  +  riyV 
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(5) 


In  the  5  and  n  directions,  respectively  [18].  For  two-dimensional  problems 
separate  dissipative  terms  are  computed  In  the  C  and  n  directions.  However, 
this  scheme  Is  not  TVD  In  two  dimensions. 


c.  Error  Estimation.  A  posteriori  error  estimation  Is  an  Integral  part 
of  our  adaptive  system.  The  general-purpose  scheme  we  modified  estimated  the 
local  temporal  and  spatial  portions  of  the  discretization  error  on  a  moving 
mesh  using  an  algorithm  based  on  Richardson's  extrapolation  (cf.  Arney  and 
Flaherty  [10]).  Flaherty  and  Moore  [20]  and  Berger  and  Ollger  [4]  used  a 
similar  form  of  Richardson's  extrapolation  to  estimate  error  on  uniform 
meshes . 

The  Richardson's  extrapolation  error  estimation  procedure  Is  expensive, 
costing  up  to  four  times  more  to  compute  than  the  solution,  and  Is  based  on 
the  assumption  of  a  smooth  solution,  which  Is  not  the  case  for  blast  problems 

(cf.  Section  4).  Therefore,  we  use  a  less  expensive  error  Indicator  (e.  J 
called  the  density  switch  which  Is  computed  as 
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using  one-sided  differences.  We  also  used  a  form  with  centered  differences. 
However,  this  error  Indicator  may  not  coverge  to  zero  as  the  mesh  Is  refined 
In  blast  problems  and  therefore,  a  maximum  level  of  refinement  must  be  used  In 
connection  with  the  error  tolerance  to  control  mesh  refinement. 


3.  ADAPTIVE  MESH  PROCEDURES.  An  algorithm  of  our  adaptive  procedure  is 
presented  in  Figure  1.  This  procedure  Integrates  the  Euler  equations  from 
time  (tint)  to  (tfinal)  while  keeping  the  local  error  estimates  below  a 
tolerance  of  (tol).  The  base  level  time  step  At  is  initially  specified,  but 
Is  changed  during  the  solution  to  maintain  stability. 
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procedure  adaptive  PDE  solverCtinit,  At,  tfinal,  tol:  real ;  M,  N:  integer) ; 


begin 

Generate  an  initial  base  mesh; 
t  :«  tinit; 

while  t  <  tfinal  do 
begin 

me8h_move(M,  N,  At,  tol); 

local  refine(0,  t.  At,  tol); 

t  t  +  At; 

Select  an  appropriate  At; 

if  base  mesh  is  too  distorted  then  regenerate  a  base  mesh 
end 

end  {  adaptive  PDE  solver  }; 

Figure  1.  Description  of  our  adaptive  algorithm  to  solve  (1)  to  within  a 
tolerance  of  (tol). 

The  rectangular  domain  il  is  initially  discretized  into  a  coarse  moving 
spatial  grid  of  M  x  N  quadrilateral  cells.  The  base  mesh  is  moved  for  each 
base  time  step  At  and  (1)  is  solved  on  this  mesh.  This  is  followed  by 
recursive  local  mesh  refinement.  The  value  of  At  is  calculated  from  the 
eigenvalues  (5)  and  the  Courant-Friedrichs-Lewy  condition  to  maintain 
stability  for  the  next  time  step.  Finally,  a  new  base  mesh  is  generated  if 
necessary.  The  solution,  error  estimation,  mesh  moving,  local  refinement,  and 
mesh  regeneration  procedures  are  explicit  and  uncoupled  from  one  another 
reducing  their  computational  cost  and  providing  flexibility.  Therefore,  the 
solution  and  error  estimation  procedures  could  be  replaced  with  ones  suitable 
for  the  Navier-Stokes  equations. 

a.  Mesh  Moving.  Our  mesh  moving  procedure  is  based  on  an  intuitive 
approach.  The  essential  idea  is  that  the  mesh  moves  to  follow  isolated 
nonuniformities,  such  as  wave  fronts  and  shocks,  which  manifest  themselves 
with  high  error  estimates.  Proper  mesh  movement  generally  reduces  dispersive 
errors  and  can  allow  the  use  of  larger  time  steps  if  the  eigenvalues  are 
reduced  in  the  Courant-Friedrichs-Lewy  condition  while  maintaining  accuracy 
and  stability. 

The  algorithm  for  our  mesh  moving  procedure  mesh  move  is  presented  in 
Figure  2.  At  each  base  time,  we  scan  the  base  mesh  and  locate  significant- 
error  nodes  as  those  having  error  indicator  greater  than  twice  the  mean  nodal 
error  estimate  and  also  greater  than  ten  percent  of  tol.  This  strategy  avoids 
having  the  mesh  respond  to  fluctuations  with  too  small  an  error  estimate,  yet 
is  sensitive  enough  to  avoid  missing  significant  dynamic  phenomena.  If  there 
are  no  significant-error  nodes,  computation  proceeds  on  a  stationary  mesh. 

The  nearest  neighbor  clustering  algorithm  of  Berger  and  Oliger  [4]  is  used  to 
gather  the  significant  error  nodes  into  rectangular  error  clusters. 
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procedure  me8h_move  (M,  N:  integer;  At,  tol;  real) ; 
begin 

for  j  1  to  M  X  N  do 

compile  list  of  ^gnificant  error  nodes  using  tol; 
if  no  significant  error  nodes  then  no  mesh  movement 

else  cluster  significant  error  nodes  into  k  error  clusters. 
for  m  1  to  k  do 

calculate  propagation  of  error  cluster  from  At; 
for  j  :=  1  to  M  X  N  do 

move  nodes  based  on  function  of  the  velocity  of 

the  nearest  error  clusters; 
smooth  the  node  movement  to  reduce  deformation; 
end  {.mesh  move  }; 

Figure  2.  Pseudo-PASCAL  description  of  mesh  moving  algorithm  to  move  mesh  for 
one  base  time  step  (At). 

We  determine  individual  node  movement  from  the  velocity  of  propagation, 
the  orientation,  and  the  size  of  the  error  clusters.  We  assume  that  nodes  in 
the  same  cluster  have  related  solution  characteristics,  so  that  we  determine 
individual  node  movement  from  the  propagation  of  the  center  of  the  nearest 
error  cluster. 

b.  Local  Mesh  Refinement.  The  local  refinement  procedure  is  invoked 
after  the  base  mesh  has  moved.  Our  refinement  strategy  consists  of  first 
calculating  the  solution  and  error  estimates  on  the  base  mesh.  Finer  grids  are 
created  in  untolerable-error  regions  by  locally  bisecting  the  time  steps  and 
the  sides  of  the  quadrilateral  cells  of  the  base  grid.  The  solution  and  error 
estimates  are  computed  on  the  finer  grids.  The  refinement  scheme  is 
recursive;  thus,  fine  grids  may  be  refined  to  create  finer  grids. 

This  grid  relationship  leads  to  a  tree  data  structure.  Information 
regarding  the  base  grid  is  stored  in  the  root  node  or  level  0  of  the  tree. 
Subgrids  of  the  base  grid  are  stored  in  level  1  of  the  tree.  The  structure 
continues,  with  a  grid  at  level  I  having  subgrids  at  level  i+l.  Grids  at 
level  I  are  given  arbitrary  ordering  and  we  denote  them  by  G[il,j]. 

Our  recursive  local  mesh  refinement  algorithm  local-refine  is  presented 
in  Figure  3.  The  procedure  integrates  (1)  from  time  tinit  to  tinit  +  dt, 
attempting  to  satisfy  the  error  tolerance  tol. 

Our  technique  for  introducing  finer  subgrids  consists  of  four  steps;  (i) 
scanning  level  I  grids  to  locate  untolerable-error  nodes,  (ii)  clustering 
those  nodes  into  rectangular  regions  [4],  (iii)  buffering  the  regions  in  order 
to  reduce  problems  associated  with  prescribing  initial  and  boundary  conditions 
at  coarse/Hne  grid  interfaces,  and  (iv)  cellularly  refining  the  level  Z 
meshes  and  time  steps  inside  the  buffered  clusters.  Base  mesh  motion  is 
maintained  on  the  refined  subgrids  to  insure  proper  nesting  in  their  parent 
grid.  If  there  are  no  untolerable-error  nodes,  the  solution  is  acceptable  and 
no  further  refinement  is  necessary. 
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procedure  local  refineCi':  integer;  tinit,  At,  tol:  real) ; 
begin  ~ 

for  j  ;*  1  to  N[il]  do 
begin 

Integrate  the  partial  differential  system  from  tinit  to  tinit  +  At 
on  grid  GU,j]; 

Calculate  error  indicators  at  tinit  At  at  all  nodes  of  grid 
G[  A ,  j  ] ; 

if  any  error  indicators  >  tol  then  introduce  level  £  1  subgrids 

of  G[fc,j] 
end  {  for  } 

if  any  error  indicators  >  tol  then 
begin 

local_ref ine(£  +  1,  tinit,  At/2,  tol/2); 
local_refine(f.  +  1,  tinit  +  At/2,  At/2,  tol/2) 
end 

end  {  local  refine  }; 

Figure  3.  Pseudo-PASCAL  description  of  a  recursive  local  refinement  procedure 
to  find  a  solution  of  the  partial  differential  system  (1)  on  all  grids  at 
level  I  of  the  tree. 

c.  Generation  and  Regeneration  of  the  Base  Mesh.  The  efficiency  of  our 
adaptive  strategies  depends  on  the  ability  to  generate  a  suitable  initial  base 
mesh  and  to  regenerate  a  new  base  mesh  should  it  become  distorted.  The  two 
essential  elements  of  mesh  generation  or  regeneration  are  determination  of  the 
number  of  nodes  and  their  optimal  location.  A  base  mesh  with  too  few  nodes 
will  result  in  excessive  refinement  or  may  completely  miss  a  fine  structure 
while  one  having  too  many  nodes  will  reduce  efficiency.  Our  approach  is  to 
use  the  error  estimation  of  a  trial  solution  for  one  on  a  K  x  l  mesh  time  step 
to  determine  the  number  of  nodes  (M  x  N)  and  their  placement  in  the  initial 
mesh  that  approximately  equidistributes  the  error  estimates. 

The  node  placement  algorithm  for  the  base  mesh  is  similar  to  the  mesh 
moving  algorithm  except  that  nodes  are  moved  toward  the  center  of  the  nearest 
error  cluster.  Nodes  nearly  equidistant  from  two  or  more  error  clusters  are 
moved  by  a  weighted  average  toward  those  nearest  error  clusters  to  maintain  a 
smooch  mesh.  Nodes  on  remain  on  dQ,  and  nodes  near  the  boundary  are  moved 
a  reduced  distance  in  order  to  prevent  the  formation  of  large  aspect  ratios. 
This  construction  generates  a  base  mesh  that  depends  on  the  solution  of  (1)  as 
well  as  the  initial  conditions.  The  mesh  generation  algorithm  is  presented  in 
Figure  4. 

The  base  mesh  can  become  distorted  by  mesh  motion  for  some  problems.  We 
regenerate  a  new  base  mesh  whenever  this  happens.  The  mesh  regeneration  or 
static  rezone  procedure  consists  of  three  steps:  (i)  determining  the  need  for 
a  new  base  mesh,  (ii)  creating  the  new  mesh,  and  (iii)  interpolating  the 
solution  from  the  old  base  mesh  to  the  new  base  mesh. 
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procedure  mesh  generation  (K,L,M,N:  integer;  At,  tol:  real) ; 
begin 

solve  (1)  using  At  on  K  ^  L  uniform  test  mesh; 
determine  new  mesh  spacing  M  and  N  from  error  estimation; 


for  j  :■  1  to  K  X  L  do 

compile  list  oT~ error  nodes  exceeding  tol; 
if  no  error  nodes  then  use  uniform  mesh 

else  cluster  error  nodes  into  P  clusters; 

for  j  ;■  1  t£  M  X  N  do 
begin 

move  nodes  toward  center  of  nearest  error  cluster, 

smooth  node  spacing  near  boundaries  of  domain  and  between  clusters; 
end  {for} 

end  {  mesh  generation  }; 

Figure  4.  Pseudo-PASCAL  description  of  mesh  generation  algorithm. 

The  base  mesh  is  regenerated  whenever  any  interior  angle  of  a  cell  is 
less  than  40  degrees  or  more  than  140  degrees  or  the  aspect  ratio  for  any  cell 
is  greater  than  15.  A  new  base  mesh  is  generated  using  the  same  procedure 
used  to  generate  the  initial  one.  The  error  clusters  used  for  regeneration 
are  those  already  determined  in  the  mesh  moving  step.  Once  the  new  base  mesh 
has  been  constructed,  the  solution  on  the  old  one  is  interpolated  to  the  new 
one  using  bilinear  interpolation. 


4.  COMPUTATIONAL  EXAMPLES. 


EXAMPLE  1.  Consider  (1)  where  a  planar  Mach  10  shock  in  air  moves  down  a 
channel  containing  a  wedge  with  a  half-angle  of  thirty  degrees.  This  problem 
was  used  as  a  test  problem  by  Woodward  and  Collela  [21].  Like  them,  we  orient 
a  rectangular  computational  domain,  -0.3  <  x  <  3.2,  0  <  y  <  1,  so  that  the  top 
edge  of  the  wedge  is  is  on  the  bottom  of  the  domain  in  the  interval  y  *  0. 
Thus,  in  the  computational  domain  it  appears  like  a  Mach  10  shock  is  impinging 
on  a  flat  plate  at  an  angle  of  sixty  degrees.  The  initial  conditions  are 


p  *  8.0,  p  =  116.5,  e  ■  563.5,  u  ■  4.125>^,  v  =  -4.125, 

if  y  <  /3(x-l/6),  (7) 

and 

P  ■  1.4,  p  ■  1.0,  e  ”  2.5,  u  *  0,  V  “  0, 


if  y  >  /3(x-l/6) . 


Along  the  left 
prescribe  Dirichlet 
the  exact  motion  of 
derivatives  are  set 
used. 


boundary  and  bottom  boundary  left  of  the  wedge,  we 
conditions  of  (7).  Along  the  top  boundary  we  prescribe 
a  Mach  10  shock.  Along  the  right  boundary,  all  normal 
to  0.  Along  the  wedge  reflecting  boundary  conditions  are 


The  solution  of  this  problem  is  a  self-similar  structure  called  a 
double-Mach  reflection  [22].  The  geometries  of  the  structures  are  very  fine. 
The  interesting  structures  are  primarily  confined  to  a  small  region  that  moves 
along  the  wedge  behind  the  incident  shock. 
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We  calculate  a  solution  for  0  <  t  <  0.19.  Refinement  was  restricted  to 
a  maximum  of  two  levels  and  a  tolerance  level  of  0.6  in  the  maximum  norm  was 
prescribed.  This  is  necessary  because  our  pointwise  error  estimate  based  on 
the  assumption  of  smooth  solutions  is  not  appropriate  for  problems  having 
discontinuities.  Without  restricting  the  maximum  level  of  refinement,  we 
could  refine  excessively  in  the  vicinity  of  a  discontinuity  and  exhaust  the 
available  storage.  Our  base  mesh  generation  procedure  provided  a  29  ^  11 
discretization  of  the  domain. 


The  sequence  of  meshes  in  Figure  5  shows  that  the  coarse  mesh  is  able  to 
follow  the  dynamic  structures  and  that  refinement  is  performed  in  the  vicinity 
of  discontinuities.  The  density  obtained  from  the  adaptive  mesh  calculation, 
which  used  a  range  of  nodes  from  960  on  the  first  time  step  to  3450  on  the 
last  time  step,  is  shown  in  Figure  6  (top).  The  adaptive  solution  compares 
favorably  with  the  solution  computed  on  a  120  x  40  uniform  mesh  shown  in 
Figure  6  (bottom). 

Severe  distortion  of  the  mesh  in  the  reflected  shock  region  caused  a 
static  mesh  regeneration  to  occur  at  t  *  0.162.  The  overhead  of  mesh  moving 
for  this  problem  is  approximately  five  percent  in  terms  of  total  computational 
time.  The  CPU  time  for  the  adaptive  solution  using  94  coarse  mesh  time  steps 
was  51  percent  of  the  time  for  the  stationary  uniform  mesh  solution  using 
200  coarse  mesh  time  steps. 

EXAMPLE  2.  Consider  (1)  where  an  infinite  cylindrical  piston  is  expanding 
radially  creating  a  radially  expanding  shock.  We  orient  the  computational 
domain,  0  <  x  <  0.05,  0  <  y  <  0.05,  to  solve  in  one  quadrant  of  the  expansion. 
The  initiaT  conditions  were  "computed  by  solving  the  following  ordinary  differ¬ 
ential  equations  from  [23,24,25]: 


dv 

dr* 


^  (1  -  Vv* 


v)2) 


-1 


da 

dr* 


U-r*  -  v  1 

(Y  -  0(1  -  ^  (V* 


(8) 


where  Up  is  the  piston  velocity,  v  is  the  fluid  velocity,  a  is  the  acoustic 
speed,  r*  is  a  nondimensional  variable  defined  as  r/Upt,  and  j  is  a  dimensional 
parameter  which  is  one  for  a  cylindrical  piston.  These  same  equations  apply 
to  an  expanding  plane  (j  »  0)  and  an  expanding  sphere  (j  ■  2).  r  is  the 
radial  distance  from  the  center  of  the  cylinder.  The  expanding  sphere  problem 
is  an  axi-symmetric  two-dimensional  problem  which  can  be  solved  with  our  code 
by  adding  the  appropriate  forcing  terms  to  (1)  found  in  Carofano  [26]. 


The  initial  conditions  and  shock  Mach  number  (M)  can  be  determined  by  a 
bisection  method  for  a  given  piston  velocity  by  matching  the  fluid  velocity. 
Tables  of  initial  values,  Mach  number,  and  shock  locations  have  been  assembled 
in  Brantley  [25]  allowing  us  to  use  a  Runge-Kutta  scheme  to  solve  (8)  directly. 
The  parameter  values  chosen  for  this  problem  are  M  ■  1.7752  and  Up  »  1.6185. 

The  solution  computed  with  (8)  for  three  different  times  is  shown  in  Figure  7. 
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Figure  5.  Grids  created  for  the  adaptive  mesh  solution  at  t  »  .038,  0.076, 
0.114,  0.152,  and  0.19  (top  to  bottom).  The  rectangular  boxes  represent  the 
error  clusters  used  to  move  the  mesh  for  the  current  time  step. 
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Figure  8.  Contours  of  the  density  at  t  “  0.0096  using  a  uniform,  stationary 
26  X  26  mesh. 


Our  adaptive  method  was  used  to  solve  this  problem.  For  comparison,  the 
density  computed  for  t  *  0.0096  on  a  uniform  stationary  mesh  is  shown  in 
Figure  8.  The  meshes  at  a  sequence  of  four  time  steps  and  the  density 
solution  for  t  ■  0.0096  using  mesh  refinement  only  are  shown  in  Figures  9  and 
10,  respectively.  Four  adaptive  meshes  at  different  times  and  the  density 
solution  for  t  ■  0.0096  using  both  mesh  moving  and  refinement  are  shown  in 
Figures  11  and  12.  The  algorithm  performed  a  static  mesh  regeneration  at 
t  ■  0.0085.  Dirichlet  boundary  conditions  were  used  on  the  left  boundary 
(x  ■  0),  and  a  reflective  symmetry  boundary  was  implemented  on  the  bottom 
boundary  (y  >  0).  The  results  in  Figure  12  show  the  refinement  and  boundary 
conditions  on  the  bottom  boundary  to  have  performed  better  than  those  on  the 
left  boundary.  Due  to  memory  restrictions  on  a  PRIME  850  minicomputer 
refinement  was  restricted  to  one  level  and  a  tolerance  of  0.05  was  prescribed. 
A  base  mesh  of  26  x  26  was  used  for  all  the  computations. 
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Figure  9.  Grids  created  for  the  mesh  refinement  method  at  t  ■  0  (upper  left), 
t  ■  0.0032  (upper  right),  t  ■  0.0064  (lower  left),  and  t  ■  0.0096  (lower  right). 


Figure  10.  Contours  of  the  density  at  t  0.0096  using  one-level  of  mesh 
refinement  on  a  26  x  26  base  mesh. 

5.  DISCUSSION.  We  have  described  an  adaptive  procedure  for  solving  the  Euler 
equations  in  two-space  dimensions  that  combines  both  mesh  moving  and  local 
mesh  refinement  techniques.  The  algorithm  also  contains  procedures  for 
initial  mesh  generation  and  mesh  regeneration.  We  used  MacCormack's  scheme 
with  Davis's  artificial  viscosity  and  a  density  switch  error  indicator.  This 
combination  of  techniques  provided  good  results  on  example  problems  while 
costing  less  than  a  comparable  uniform  mesh  calculation  or  calculation  of  a 
comparable  solution  using  mesh  refinement  only  (cf.,  Amey  and  Flaherty  [3]). 

There  is  still  work  to  be  done  ih  order  to  make  use  of  the  power  of 
adaptive  methods.  Better  error  estimation  is  needed  so  that  accurate  error 
can  be  obtained  near  discontinuities  to  avoid  excessive  mesh  refinement.  The 
algorithm  must  be  interfaced  with  a  grid  generation  package  for  general  domain 
geometry.  The  greater  efficiency  of  adaptive  techniques  will  be  most 
beneficial  in  three  dimensions.  Therefore,  our  techniques  must  be  able  to  take 
advantage  of  the  latest  advances  in  vector  and  parallel  computing.  The  tree 
is  a  highly  parallel  structure  and  we  hope  to  develop  a  procedure  to  exploit 
our  mesh  refinement  data  structure  on  a  parallel  computing  environment.  This 
process  will  probably  include  both  static  and  dynamic  allocation  of  multiple 
processors . 
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Figure  11.  Grids  created  for  the  adaptive  mesh  solution  at  t  =  0  (upper  left), 
t  *  0.0032  (upper  right),  t  =  0.00064  (lower  left),  and  t  =  0.0096( lower  right) 
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Figure  12.  Contours  of  the  density  at  t  »  0.0096  using  the  adaptive  method  on 
a  26  26  base  mesh. 
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HOW  TO  DESCRIBE  OSCILLATIONS  OF  SOLUTIONS  OF 
NONLINEAR  PARTIAL  DIFFERENTIAL  EQUATIONS 

Luc  TARTAR 

Carnegie  Mellon  University 


I  want  to  describe  here  some  developments  in  progress  concerning  the 
mathematical  tools  used  to  describe  the  relations  between  microscopic 
and  macroscopic  levels.  Before  describing  the  new  approach,  I  will  first 
sketch  what  was  my  preceding  point  of  view  on  that  question,  in  order  to 
list  the  defects  of  the  old  classical  approach  so  that  one  can  see  which 
defects  will  be  corrected  by  the  new  approach  and  which  one  do  remain. 

A  Classical  Approach 

There  are  different  mathematical  models  used  for  the  purpose  of 
describing  the  relations  between  microscopic  and  macroscopic  levels  and 
the  more  common  one  is  to  use  a  probabilistic  framework  :  co  denoting  the 
generic  point  of  a  probability  space  FI  and  E  denoting  the  expectation  (i.e. 
the  integration  on  n)  one  can  say  that  if  a  microscopic  variable  is  denoted 
by  U(x,Q)),  then  E(U(x,.))  will  be  the  associated  macroscopic  variable. 

Another  fashionable  model  is  the  periodic  modulation  setting  where 
one  deals  with  functions  defined  on  QxR^  and  periodic  in  y  with  unit  cell  Y. 
In  that  model  if  a  microscopic  variable  is  denoted  by  U(x,y)  then  the 
corresponding  macroscopic  variable  is  JyU{x,y)dy/meas(Y). 

The  model  that  I  have  been  advocating  for  more  than  15  years  (which 
initiated  in  joint  work  with  F.  MURAT)  is  based  on  the  use  of  weak 
convergence  :  one  considers  a  sequence  Ug  of  functions  defined  on  an  open 

set  Q  of  R*^  and  taking  values  in  rP;  one  says  that  this  sequence  shows 
oscillations  if  it  converges  weakly  but  not  strongly.  Generally  if  a 
sequence  Vg  converges  weakly  to  Vq  as  e  0,  then  one  calls  Vg  a 

microscopic  variable  and  Vq  the  corresponding  macroscopic  variable.  This 
framework  extends  the  periodic  setting:  indeed  if  one  sets  Ug(x)  sU(x,x/e), 
then  as  e  0,  U  g  converges  weakly  to  Uq  given  by  Uq{x)  a 

JYU(x,y)dy/meas(Y)  (assuming  that  U  is  periodic  in  y  and  satisfies  some 
regularity  hypotheses  in  (x,y)). 

I  do  not  want  to  go  into  the  technical  details  of  functional  analysis 
involved  in  the  definition  of  weak  and  weak*  topologies,  but  it  is  worth 
noticing  that  this  point  of  view  is  the  one  used  by  physicists  when  they 
replace  a  discrete  distribution  of  point  masses  by  a  smooth  density.  The 
fact  that  for  every  continuous  function  f  on  [0,L]  one  has 
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L 

f(— )  ff(x)dx  as  m  o® 
m  J 

»=i  0 

is  equivalent  to  a  weak*  convergence  of  measures,  namely 
L 

X^iL  Xfox]  *  as  m  oo 

‘=*  m 

where  5^  denotes  the  Dirac  mass  at  the  point  a  (it  acts  on  f  by  evaluating  f 
at  a)  and  X|o  l]  characteristic  function  of  the  interval  (O.L). 

Roughly  speaking  a  sequence  of  functions  Ug  converges  weakly  to  Uq  if 

Jcij(Ug(x)-Uo(x))dx  -»  0  for  any  measurable  set  co  while  it  converges  strongly 
if  J<„lUe(><)  -  UoMIdx  -  0. 

With  these  definitions  one  sees  easily  that  sin(x/e)  0  weakly  while 
sin^(xye)  ->  1/2  weakly.  A  similar  example,  with  a  simple  physical 
interpretation  is  the  following  :  if  Ug(x)  e  denotes  the  microscopic 

velocity  which  means  the  exact  velocity  for  a  particle  at  a  point  x,  then 
the  macroscopic  velocity  Uq(x)  is  the  average  velocity  near  the  point  x;  the 

microscopic  kinetic  energy  is  kg(x)  -  |U£{x)|^/2  while  the  macroscopic 

total  energy  is  kQ(x)  -  lUo(x)|^/2  +  e{x)  where  e(x)  ^  0  denotes  the  internal 

energy  which  is  then  a  macroscopic  quantity  without  analog  at  the 
microscopic  level. 

These  examples  show  that  constitutive  relations,  which  are  of  the 
form  Ug(x)  €  closed  set  K  of  RP,  will  not  always  be  satisfied  by  the 

macroscopic  quantities;  indeed  one  can  only  say  in  general  that  Uq(x)  will 
belong  to  the  closed  convex  hull  of  K. 

A  natural  question  is  then  to  describe  the  weak  limits  of  F(U£)  for  any 

continuous  function  F  and  the  answer  is  given  by  Young  measures  :  if  K  is 
closed  and  bounded  (and  after  extracting  a  subsequence)  there  are 
probability  measures  living  on  K  and  depending  in  a  measurable  way  of  x 

in  Q  such  that  for  every  continuous  function  F,  F(Ug)  converges  weakly  to  a 

limit  Ip  given  by  !p(x)  -  J,^F(k)dVj^(k).  Roughly  speaking,  making  a 

measurement  of  the  values  of  F(Ug)  near  x  will  give  a  random  answer 

following  the  probability  measure  Vj^. 

Following  this  definition  one  sees  that  if  U(x,y)  is  periodic  in  y  and 
we  consider  U£(x)  s  U(x,x/e)  then,  under  some  regularity  hypotheses,  the 

Young  measures  are  defined  by  LF(k)dVy(k)  a  LF(x,y)dy/meas(Y). 
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Another  natural  question  consists  in  taking  into  account  the  fact  that 
the  functions  that  we  study  (which  are  related  to  problems  in  continuum 
mechanics  or  physics)  satisfy  some  differential  equations,  namely  some 
balance  equations 

"strongly"  for  i  = 

j=l  k=l  k 

where  Ajj|^  are  real  constants  (usually  U,  will  converge  weakly  in  (L^(Q))P, 

and  "strongly"  will  mean  (H‘^|Qg(fl))P  strong,  which  is  implied  by  (L^(fi))P 
weak). 

As  the  balance  equations  are  taken  in  the  sense  of  distributions  the 
functions  may  have  jumps  accross  smooth  hypersurfaces;  if  there  is  a 

jump  X  in  Ug  at  a  point  where  the  normal  is  ^  then  (X,^)  must  belong  to  the 

following  characteristic  set  V 

V  =  {(X,^)  e  such  that  =  0  for  i  = 

j=l  k»l 

The  set  of  possible  jumps  compatible  with  the  balance  equations 
written  in  the  sense  of  distributions  is  then  the  following  characteristic 
set  A 

A  =  {X  €  R**  such  that  there  exists  ^  with  X.^A:ji.X-^u  =  0  for  i  *  l,..,q} 

j»l  k«l 

Using  this  characteristic  set  (which  has  taken  into  account  some 
information  on  the  balance  equations)  one  can  obtain  some  new 
information  on  limits  of  quadratic  quantities  :  assume  that  Ug  converges 

weakly  to  Uq  in  (L^(Q))P  and  UgjUgj  converges  weakly  to  UgjUQj  +  Rjj  as 

measures,  then  one  has  the  following 

Theorem:  If  Ug  satisfy  the  balance  equations  and  if  Q(X)  h  LjjqjjXjXj 

satisfies  Q(X)  >  0  for  all  X  e  A,  then  XjjqjjRjj  ^  0. 

It  means  that  R,  which  is  always  a  (measured  valued)  nonnegative 
symmetric  matrix  is  constrained  by  the  balance  equations  through  the 
characteristic  set  A  :  it  must  belong  to  the  convex  hull  of  {X0X  |  X  e  A}. 

This  theorem,  obtained  in  1977,  extends  some  results  of  F.  Murat 
called  compensated  compactness,  a  generalization  of  a  useful  remark  that 
we  had  made  in  1974  in  our  joint  work  on  homogenization,  the  div-curl 
lemma. 

Although  the  characteristic  set  A  does  not  contain  as  much 
information  as  the  characteristic  set  V,  this  theorem  led  me  to  a  useful 
("classical")  approach  for  studying  oscillations  in  the  nonlinear  partial 
differential  equations  of  continuum  mechanics  : 
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Oscillations  are  described  by  the  Young  measures  constrained  by 

1.  Constitutive  relations  :  the  support  of  lies  on  K 

2.  Balance  equations  :  jQ(k)dv^(k)  ^  Q{Jkdv^(k))  for  every  quadratic  Q  such 
that  Q(X)  2  0  on  A. 

3.  Entropy  conditions  :  one  adds  any  other  equations  (or  inequalities, 
similar  to  entropy  conditions)  implied  by  constitutive  relations  and 
balance  equations  and  apply  the  two  preceding  points  to  the  new  setting. 

One  important  idea  was  to  show  that  oscillations  were  impossible  in 
some  situations;  indeed  if  one  can  show  that  the  only  possible  Young 
measures  satisfying  these  constraints  were  Dirac  measures  then  one 
would  deduce  that  the  subsequence  was  converging  strongly. 

I  was  succesful  with  this  approach  for  studying  a  scalar  hyperbolic 
equation  and  the  extension  to  some  hyperbolic  systems  of  conservation 
laws  was  made  by  R.  Diperna.  it  covered  some  cases  where  other  methods 
were  not  powerful  enough  and  (although  the  amount  of  technical  work 
associated  with  this  method  is  very  important)  this  approach  was 
followed  in  subsequent  work  of  D.  SERRE,  M.  Rascle,  C.  Morawetz,  C.  Dafermos, 
J.  Nohel  among  others. 

If  one  could  not  preclude  oscillations,  the  next  question  was  to  study 
their  propagation  and  interaction,  and  this  required  characterizing  the 
structure  of  the  Young  measures.  I  developed  such  an  application  to 
semilinear  hyperbolic  systems  in  one  space  dimension  but  found  that  in 
general  one  needed  to  use  correlations  (which  cannot  be  seen  by  the  Young 
measures)  and  did  some  work  in  that  direction  with  G.  Papanicolaou. 

There  has  been  some  extensions  of  these  ideas  for  computing  the 
propagation  and  interaction  of  oscillations  in  linear  or  semilinear 
hyperbolic  systems  by  B.  Engquist  and  it  has  also  been  extended  to  some 
quasilinear  hyperbolic  systems  which  are  linearly  degenerate  by  D.  SERRE 
and  M.  Bonnefille. 


Defects  of  the  Classical  Approach 

When  working  on  question  related  to  nonlinear  partial  differential 
equations  of  continuum  mechanics  or  physics,  it  is  useful  to  describe  the 
achievements  of  a  given  method  but  also  important  to  list  its  limitations. 
Some  of  the  limitations  will  be  overcome  by  the  new  approach,  but  not  all 
of  them. 

The  real  interrelation  of  the  characteristic  set  V  and  the  constitutive 
manifold  K  have  not  been  understood;  more  differential  geometry  seems 
necessary  in  order  to  clarify  this  question,  which  unfortunately  still 
remains  open  at  the  moment. 
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Oscillations  are  not  the  only  difficulties  encountered  in  nonlinear 
partial  differential  equations  and  one  should  also  consider  the 
complementary  question  of  concentration,  as  studied  in  the  work  of  P.L. 
Lions  and  of  R.  Diperna  &  A.  Majoa.  The  new  approach  will  treat  oscillations 
and  concentration  in  a  more  unified  way. 

If  weak  convergence  appears  to  be  natural  for  quantities  which  can  be 
added  (they  are  usually  coefficients  of  differential  forms)  there  are  other 
quantities  for  which  something  different  has  to  be  used  and  for  these  the 
adjective  averaged  should  be  replaced  by  effective.  This  point  of  view  has 
been  developed  in  the  theory  of  homogenization  whose  purpose  is  to 
understand  effective  properties  of  mixtures  (periodicity  assumptions 
should  be  avoided  in  this  context)  and  it  was  in  connection  with 
homogenization  that  most  of  my  preceding  ideas  had  been  developed.  The 
simple  question  of  computing  effective  coefficients  for  layered  media 
(which  is  well  understood)  shows  the  importance  of  adding  to  the 
preceding  description  at  least  one  geometric  parameter  for  showing  the 
orientation  of  the  layers.  Apart  from  the  technical  question  of  finding 
optimal  bounds  for  effective  coefficients  (and  compensated  compactness 
does  play  an  important  role  for  that  purpose)  one  goal  to  keep  in  mind  is 
that  one  needs  to  understand  the  evolution  of  mixtures.  The  preceding 
mathematical  tools  could  not  see  both  the  x  and  the  ^  variables  and  take 
advantage  of  the  complete  characteristic  set  V;  the  new  one  will  be  able 
to  correct  this  defect  and  enable  us  to  address  some  of  the  open  questions 
(but  obviously  not  all  of  them). 

Propagation  of  oscillations  and  concentration  (which  is  a  different 
matter  than  the  propagation  of  singularities  studied  by  specialists  of 
linear  partial  differential  equations)  cannot  be  seen  by  the  Young  measures 
and  one  idea  (which  I  tried  for  a  long  time,  without  success)  is  to  split 

the  Young  measures  in  directions  so  that  one  could  write  some  kind  of 
transport  equation.  The  new  tool  will  construct  a  similar  object,  but  not 
from  the  Young  measures  and  so  will  contain  a  different  information,  but  a 
compensation  for  this  loss  of  information  will  be  the  useful  properties  of 
the  new  measures,  which  I  have  called  H-measures  in  order  to  remind  of 
their  origin  in  homogenization  theory. 


A  New  Mathematical  Tool  :  H-Measures 

For  a  subsequence  converging  weakly  to  0  in  (L^(n))P  we  will 

define  a  family  of  measures  ^  indexed  by  x  e  ^2  and  ^  e  They  will 

give  a  better  description  of  the  oscillations  and  their  propagation  through 
some  kind  of  microlocal  H-calculus  enabling  us  to  use  in  a  better  way  the 
balance  equations.  They  do  not  contain  all  the  information  of  the  Young 
measures  based  on  the  constitutive  relations,  but  they  will  improve  the 
compensated  compactness  theorem. 
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For  and  ^2  D(S^'^)  one  defines  the  H-measure  m 

with  entries  by  computing  the  following  limits  as  e  -»  0  (after 
extracting  some  subsequence) 

<^...^j(x)^2(x)*v(^>  » limit  jF(^jU*)F(^2Up*>tr(^)d^  as  e  ->  0 

r" 

where  F  denotes  Fourier  transform  {Ff(^)  -  Jf(x)e^‘**-^dx)  and  z*  denotes 
the  complex  conjugate  of  z. 

Note  that  there  is  indeed  something  to  prove  here  so  that  the  above 
definition  makes  sense,  namely  that  if  and  ^2  disjoint  support  then 

the  above  limit  is  0;  this  is  the  reason  why  the  H-measures  are  only 
defined  for  sequences  converging  weakly  to  0. 

Although  there  is  some  analogy  with  the  definition  of  the  wave  front 
set  of  a  distribution  as  it  was  given  by  HOrmander,  the  framework  here  is 
entirely  different  :  we  deal  with  sequences  and  are  interested  in  the 
difference  between  weak  convergence  and  strong  convergence  (which  is 
the  case  where  the  H-measure  is  0). 

Because  the  function  xy  are  homogeneous  of  degree  0,  the  H-measures 
cannot  distinguish  between  different  frequencies  for  propagation  in  some 
direction,  which  they  will  be  able  to  describe. 

The  H-measures  will  give  us  some  results  in  small  amplitude 
homogenization  which  were  traditionnally  obtained  using  2-point 
correlations  (which  can  be  easily  defined  both  in  a  periodic  or  in  a  random 
setting,  but  not  in  a  general  case  without  the  use  of  a  characteristic 
length);  the  H-measures  do  not  contain  the  information  on  2-point 
correlations,  but  can  be  deduced  from  them  by  a  singular  integral,  as  was 
pointed  out  to  me  at  a  later  meeting  by  M.  Avellaneda  who  had  worked  in 
the  random  case.  At  the  same  time  G.  MILTON  pointed  out  that  one  could  not 
expect  to  find  results  about  scattering  unless  constructing  similar 
measures  using  3-point  correlations,  but  it  is  not  clear  if  such  measures 
can  be  constructed  that  would  also  retain  the  other  properties  that  I 
wanted,  namely  the  use  of  balance  equations  to  give  some  information  on 
propagation. 

Example  1;  Periodic  modulation. 

If  Ugjx)  »  V(x,x/e)  with  V(x,y)  having  period  1  in  each  y  coordinate,  then 

(under  some  regularity  hypothesis)  V  admits  a  Fourier  decomposition 
V(x,y)  -  v^{x)e2'’*ni.y  and  the  H-measure  is  equal  to  2:’n^lv^(x)|25f^/|n^l 

where  the  sum  Is  taken  over  m  e  Z^\0  (Vq  »  0  by  hypothesis). 

Example  2;  Concentration  effects. 

If  U£(x)  -  e'*^'^2f(x/e)  for  an  function  f.  then  the  H-measure  is  equal  to 
5q®v  where  v  has  a  surface  density  v(^)  »  Jg“*|Ff(t^)|^t'^'‘'dt. 
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Localization  Property  :  Balance  equations  restrict  the  H-measures  ^  to 

be  a  combination  of  hermitian  nonnegative  matrices  of  the  form  with 

X  e  cP  such  that  €  characteristic  set  V. 

Propagation  Property  :  One  assumes  that  U-  -»  0  in  (L^(0))P  weak  and 


2.  A--  i  =  V  — >  0  m  L  (Q)  weak 
^  y*  dx.  ei 
j.k  * 

then,  in  some  cases,  one  can  obtain  some  differential  equations  satisfied 
by  the  H-measure  describing  the  oscillations  of  Ug.  It  involves  a 

creation  term  using  another  H-measure  \r^  ^  describing  the  joint 

oscillations  of  Ug  and  Vg. 

I  have  not  elucidated  yet  what  are  the  algebraic  computations  to 
perform  in  the  general  case,  but  in  classical  examples  it  does  give  what 
the  intuition  suggests  (based  on  physics  or  linear  theory).  If  dealing  with 
the  wave  equation  one  does  obtain  the  expected  transport  equation  for  the 
H-measures  however  one  does  not  find  here  any  role  for  caustics 

because  the  H-measures  describing  only  the  amplitude  and  the  direction  of 
propagation  of  both  oscillations  and  concentration  effects  cannot  feel  the 
changes  of  phase  that  happen  when  crossing  caustics. 

In  the  case  of  a  scalar  equation  one  can  see  more  easily  the 
difference  between  the  static  localization  property  and  the  dynamic 
propagation  property.  It  is  worth  pointing  out  that  our  test  function  <j>j  or  y 

can  be  chosen  to  be  only  continuous  and  that  enable  us  to  derive  some  kind 
of  pseudo-differential  calculus  with  zero  order  operators  having  only  a 
continuous  symbol.  Let  us  consider  a  simple  linear  equation 


"  +  b(x)u^  =  0 

i*l  i 

where  the  coefficients  aj  are  of  class  while  b  is  of  class  C°. 

On  one  hand  the  localization  property  says  that  the  H-measure  ^  for  u^ 

(assuming  that  u®  converges  weakly  to  0)  will  be  supported  by  the  zero  set 
of  the  function  P  defined  by  P(x,^)  -  Zjaj(x)^j. 

On  the  other  hand  the  propagation  property  states  that  for  every  test 
function  C)(x,^)  of  class  on  with  compact  support  in  x,  one  has 

ap  30  ap  ^  ^ 

<n,  X - -r - -5 - >  +  2<4,bO>  =  0 

(O  is  extended  as  an  homogeneous  function  of  degree  0  in  ^);  it  implies 
that  oscillations  and  concentration  effects  do  propagate  along 
bicharacteristic  rays. 

When  u®  is  solution  of  different  first  order  equations  then  the  H-measure 
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is  supported  by  the  intersection  of  the  zero  sets  of  the  symbols  of  the 
first  order  operators  and  satisfies  different  equations  which  may  happen 
to  be  incompatible  constraints  forcing  p  to  be  0  and  thus  precluding 
oscillations. 

Applications  to  Homogenization 

Effective  properties  of  mixtures  cannot  be  obtained  from  the 
knowledge  of  averages  or  weak  limits  (except  in  special  situations  like 
layered  materials)  and  it  cannot  be  deduced  from  H-measures  either.  In 
some  cases  however  the  introduction  of  H-measures  can  help  reformulate 
the  problem.  For  example  H-measures  are  useful  in  deriving  good  bounds 
for  effective  coefficients,  but  my  results  in  this  direction  are  too 
fragmentary  at  the  moment  to  explain  what  is  the  best  way  to  use  them. 

H-measures  are  the  right  mathematical  tool  for  studying  small 
amplitude  homogenization;  the  formula  obtained  are  analog  to  some  that 
were  known  to  specialists  in  a  periodic  or  in  a  random  setting.  If  one 
considers  the  diffusion  (of  electricity  or  heat)  in  a  mixture  of  materials 
of  near  by  conductivities 

-div(A®(x;Y)gradU®(x;7))  =*  p®  in  Q 
where 

A^(x;Y)  =  A®(x)  +  7B®(x)  +  Y^C®(x)  +  o(Y^  with  and  C®  C°  weakly 

then  the  effective  conductivity  has  the  form 
A^^(x;y)  =  A°(x)  +  yB°(x)  +  y^[c\x)  -  M(x))  +  o(y^ 

where  the  correction  term  M(x)  (which  is  nonnegative)  can  be  computed 
from  the  H-measure  associated  to  the  sequence  (B^  -  Bq}  by  a  specific 

integration  in  ^  on  S^'^;  for  example  asssuming  that  B^  is  isotropic  so  that 

we  have  only  one  scalar  H-measure  the  formula  is 

M(x)  =  f— — du  , 

This  procedure  extends  to  other  models  like  linear  elasticity  and  can 
give  relations  between  different  effective  properties  of  a  given  mixture. 

H-measures  can  also  give  the  exact  answer  for  the  effective 
behaviour  in  some  cases  where  the  oscillating  coefficients  do  not  appear 
in  the  highest  order  terms.  The  following  example,  which  has  some 
similarities  with  stationary  Navier-Stokes  equations,  is  instructive  :  if  u^ 
is  a  vciccity  field  solution  of 

-Au®  +  u®x[curlv®]  +  gradp®  =  f  and  divu®  =  0  in  Q 
where  v^  Vq  weakly;  then  the  effective  equation  is 
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-Au**  +  u*^x[curlv°l  +  Mu®  +  gradp®  =  f  and  divu®  »  0  in  Q 

and  the  correction  term  can  be  computed  from  the  H-measure 
associated  to  the  sequence  {v^  >  v^}  by  a  specific  integration  on  ;  the 

formula  is 
M(x)  =  - 

sN-i  k  k.i 

Such  formulas  should  be  useful  in  order  to  understand  turbulence 
effects,  at  least  at  their  onset. 

Conclusion  ; 

There  are  many  other  areas  where  I  plan  to  use  this  simple  new 
approach  based  on  H-measures.  and  stability  questions  in  continuum 
mechanics  is  one  of  them,  but  I  have  a  bolder  conjecture,  i.e.  that 
H-measures  may  be  one  of  the  missing  mathematical  tools  necessary  to 
explain  why  some  rules  invented  by  physicists  work  so  well,  in  spite  of 
their  irrational  derivations  (it  was  for  understanding  situations  obviously 
related  to  the  difficult  homogenization  problem  of  propagation  of  waves  in 
mixtures  that  physicists  have  invented  quantum  mechanics).  The 
computation  of  the  correction  terms  that  have  appeared  in  some  of  the 
examples  which  I  have  described  above  present  striking  analogies  with 
some  which  are  done  by  following  some  dogmatic  rules  of  quantum 
mechanics;  those  that  I  have  made  were  entirely  deductive  and  part  of  a 
general  program  of  study  of  nonlinear  partial  differential  equations. 

More  detailed  proofs  of  the  constructions  sketched  here  will  appear 
elsewhere. 
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